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HUIUAN  PmOCESSWG  OF  EQUIVOCAL  INFOitMATSOH 


ABSTRACT 

This  report  contains  a  series  of  investigating  the  atstities 

of  subjects  to  revise  probat^iity  estimai^s  cm  the  basis  of  nev  information. 
These  studies  show  that  subjects*  prc^rability  estimates  are  reliable  but 
deviate  considerabiy  from  posterior  is-obai^iities  calculated  from  Bayes*s 
theorem.  The  deviations  are  almost  always  in  tl»  conservative  direction, 
i.e.,  low  Bayesian  prc^biiities  are  overestimated,  h4^  ones  are 
underestimated.  Cmiy  when  each  datum  is  very  ami^guous  do  sd^ects* 
estimates  become  more  extreme  than  Bayesian  proiablliUes.  farther, 
when  subjects  are  asked  to  give  90%  or  56%  cre^ble  intervals  of  a  pos> 
terior  jn'obabi  Jty  distribution,  their  estimates  are  wider  than  Bayesian 
credlbk  intervals.  Tius  finding  of  conservatism  has  led  to  the  design  id 
a  man 'Computer  system  that  simukl  minimize  the  effects  imman  short¬ 
comings  in  roadcing  diagnoses. 


I  j 

lNTSOmJCTIC»f:  A  SCIlKTtFtC  OVERVIEW 

This  is  the  final  report  cd  a  three-year  program  oi  rcHNsarch  inio  human  t^rsaaffoa  pfoC* 
essti^  and  decision  maklig,  allying  mdstiques  based  mi  Bayes*s  prohabtlity  theorem  to  the 
design  of  man-machine  systems  for  informatum  procMsiag  (&iytt*S  t^n^em  ts  explained  ia 
detail  in  Section  2).  Appendix  B  ia'iefiy  summarises  the  pabiicatioes  alrmi^  in  priui  mr  in 
I»-e8s  that  have  developed  from  the  omtract.  In  ord«r  to  this  final  repmrt  to  reasoniMe 
length,  m  attemjA  will  be  made  to  duplicate  toformation  omtalned  is  pubUcstioes  «2tB]aariKd 
in  Ai^ndix  B;  the  primary  purpose  here  is  to  repeat  reanrch  cMipieted  i«il«r  ContSitot 
AF  !9{604)-73S3  but  not  yet  pubUsfaed. 

Certain  activities  conducted  under  tms  contract  cannot  be  fnroperly  r^lectsd  in  a  Anal 
report.  Chie  of  them  is  the  develi^nent  of  research  {dans  fca'  sessidsnilaticm  m^nrim^ds 
concerned  with  {aotoUlistic  informati<m-{X’i^essti%  systems.  Thne  rssmur^  ^a»  omtpitS 
a  great  deal  of  time  atKl  attention  ihiririg  the  lost  IB  mondts  <d  die  oiitoritoi,  but  hamt  bo  nttSfH 
reached  fruition  as  yet.  This  work  is  being  ornttmied  under  Cotdmct  AF  19(t3B)-2t23,  and  wili 
appear  in  pubiications  sponsored  that  cemtract. 

A  second  class  of  activity  that  cannot  be  ade^mtcly  r^tlec^  in  Ihia  ffa!  r^toft  is  the 
jaoceedings  of  a  conference  on  Bayesian  Information  Processing  hehf  at  the 

cd  Michigan  in  May,  1953.  Participrants  in  this  emifereice  exchanged  iiffiMrmatton  about  r^mareh 
cm  men  as  Bayesian  informatiem  {uncessors  and  about  the  d«lifR  mid  e»toattoe  &tyesima 

'This  Section  was  prepared  by  Ward  Edwards. 
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inlormation-processing  systems.  No  formal  publications  were  intended  to  result  from  tins 
conference;  its  purpose  was  to  facilitate  informal  interchange  of  information  and  to  bring  dif¬ 
ferent  researchers  concerned  with  related  problems  into  interaction  witn  one  another  so  tliat 
their  research  would  coordinate  awi  make  a  more  cohesive  whole  than  might  otherwise  be 
possible. 

Research  conducted  under  Contract  AF  19{604)-7393  was  of  three  major  kinds.  One  con¬ 
sisted  of  primarily  theoretical  investigations  into  the  formal  characteristics  of  Bayes's  theorem 
as  a  mathematical  model  for  the  revision  of  opinion  in  the  light  of  information.  This  work  cul¬ 
minated  in  a  long  article  about  the  relevance  of  Bayesian  ideas  to  statistics,  another  concerning 
the  optional  stopping  problem,  and  several  minor  efforts.  A  second  land  of  research  consisted 
of  laboratory  studies  comparing  man  and  Bayes's  theorem  as  Information  processors,  and 
finding  that  man  is  the  more  conservative;  a  number  of  studies  elaborated  this  finding  and 
examined  some  of  the  parameters  that  affected  It.  The  third  kind  of  er«JeaTOr  under  the  program 
was  the  development,  elaboration,  and  thinking-through  of  an  idea  for  a  Bayesiar.  informa tion- 
processing  system,  or  PIP,  followed  by  the  development  of  semislmulation  reseaich  techniques 
for  the  exploratio"  and  validation  of  that  idea.  Of  the  three  classes  or  research,  this  one  had  the 
least  visible  product,  since  it  consisted  prlmerily  of  intellectual  effort,  mostly  of  a  nonfRiblishable 
nature.  Nevertheless,  this  class  of  endeavor  seems  likely  to  have  the  greatest  impact  in  the  long 
run  on  military  technology  and  Air  Force  system  design. 

This  scientific  overview,  which  is  really  nothing  more  than  a  brief  introduction  both  to  the 
publications  that  have  already  emerged  from  this  contract  and  to  the  chapters  that  follow,  'alii 
ignore  the  first  kind  of  effort  completely.  The  formal  work  on  Bayesian  statistics  and  optional 
stORiing  has  been  fully  reported  in  jwblicatlons,  stands  on  its  own  feet,  aisl  needs  neither  amp¬ 
lification  nor  rtview.  The  summaries  of  publications  in  Appendix  B  briefly  report  what  was  done. 

The  main  research  endeavor  of  the  contract  was  concerned  with  the  comparison  of  men  with 
Bayes’s  theorem  as  protebilisUc  information  processors.  When  the  contract  began,  essentially 
no  information  about  the  quality  of  human  information  processing  in  unspeeded  tasks  was  avail¬ 
able.  It  was  widely  supposed  that  men  were  good  information  processors,  but  little  was  known 
about  how  good,  mostly  because  there  was  no  formal  model  for  proper  Information-processing 
methods.  The  research  started  from  the  premise  that  Bayes’s  theorem  was  an  optimal  model 
for  information  processing,  and  consequently  that  straightforward  experiments  comparing  human 
performance  with  the  output  of  Bayes's  theorem  might  lead  to  insights  regarding  the  quality  of 
human  information  processing.  Thus  the  experiment  described  in  Section  2  was  designed  ,is  a 
frontal  attack  on  the  problem.  It  used  a  very  complicated  task  involying  4  mutually  exclusive 
hypotheses  and  12  different  possible  observations,  displayed  to  the  subjects  a  set  of  48  proba¬ 
bilities  of  the  data  given  the  hypotheses,  and  then  required  the  subjects  to  generaU  posterior 
probability  estimates.  Not  too  surprisingly,  U  turned  out  that  their  estimates  differed  from 
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Bayesian  probabilities.  What  was  more  interesting,  however,  was  that  these  differences  were 
consistently  in  the  direction  that  we  have  come  to  call  conservative;  that  is,  subjects  consistently 
overesiimatal  low  Bayesian  posterior  probabilities  and  underestimated  hi^  posterior  probabil¬ 
ities.  No  subject  extracted  from  toe  data  anything  approaching  the  certainty  It  would  justify. 

It  seemed  entirely  possible  to  us  that  these  deviations  from  Bayesian  probabilities,  over¬ 
whelming  in  size  and  consistency  though  they  wer^  mi^t  have  been  attriUitabie  to  artifacts  of 
one  kind  or  another.  Conse<iaentiy,  we  designed  two  experiments  intended  to  examine  two 
artifacts  that  we  tiuHtght  might  be  relevant  —  and  ha^ly  fmmd  that  neither  artifact  need  be 
considered  too  seriously.  In  one  of  these  expcr  *ents  {Section  3.1  we  asked  whether  subjects 
might  have  been  confused  by  the  fact  that  in  the  o  -iginal  experiment  the  data  did  not  resemble 
any  of  the  distributions  of  data  to  be  expected  with  toe  given  hypotheses:  it  turned  out  that  this 
made  ik>  difference  whatever.  In  the  other  experiment  we  examined  ti»  effects  of  sequential atnd 
imnsequentiai  presentaticm  of  data  and  found  much  the  same  amount  of  conservatism,  whether  the 
subject  in  effect  started  from  scratch  each  time  or  was  allowed  to  retain  bis  previous  posterior 
probability  estimates  for  use  as  prior  probabilities,  with  mtiy  an  incremental  datum  added. 

In  another  study,  (H.  C.  A.  Dale,  unpublished),  subjects  were  allowed  to  specify  their  own 
values  of  toe  probability  of  the  datum,  given  the  hypothesis  []P(D!H)];  still  they  were  ccsiserv- 
attve.  It  seems  that  whether  or  not  the  value  of  FfDiH)  conforms  to  toe  subject's  intoitive 
appraisal  of  what  it  ought  to  be  makes  very  little  difference  to  his  laformatton-processinf 
performance. 

Still  another  study  (Section  3)  raised  toe  question  of  whether  it  is  ewer  possible  to  get  sub¬ 
jects  to  overestimate  a  posterior  probability.  It  turns  out  that  the  answer  is  yes,  if  toe  infor¬ 
mation  given  to  the  subject  is  sufficiently  wortoless  for  diagnostic  purposes.  That  is,  when 
Bayesian  posterior  probabilities  are  very  little  different  from  Bayesian  |a‘ior  {U'oixitoUties, 
a  subject's  estimates  of  posterior  probabilities  usually  are  more  extreme. 

Next  we  turned  our  attention  to  the  question  of  whether  this  coasei coukl  be 
aiso  in  much  simpler,  more  strai^ticrward  kinds  of  experiments.  In  one  sudi  expert i-  ent 
(Section  4),  we  presented  subjects  with  observations  drawn  from  a  normal  distributimi  re  • 
quired  them  to  estimate  a  posterior  credible  interval  for  toe  mean.  Fii^.ngs  from  tois  study 
were  entirely  consistent  with  the  findings  from  the  {a'evious,  more  compUeated  study '  sid^ects 
were  consistently  conservative.  The  task  of  estimating  a  credible  intervai,  however,  la  uitfamilisr 
to  subjects  and  their  estimates  were  rather  variable. 

Finally,  we  sought  the  simplest  possible  task  in  which  subjects  cmikl  perform  tMs  iond  ^ 
information  processing  (Section  5).  We  ended  by  using  a  simple  binomlai  task  in  which  mil^eets 
must  decide  which  of  two  hypotheses  about  the  percentage  of  red  poker  chips  In  a  bo^bag  feU 
of  red  and  blue  polter  chips  is  correct.  Here,  too,  we  obtained  conservatism,  thouito  quite 
so  much  of  it  as  in  toe  experiment  reported  in  Section  2.  The  data  indicated  very  clearly  that 

3 


1 

I 


M 


I 

4 


even  in  this  simplest  of  ail  possible  l^yesian  tasks,  subj^-cts  are  unable  to  extract  from  infor¬ 
mation  all  the  certainty  that  is  latent  in  it.  This  sihiation  seemed  appropriate  for  further  study, 
so  we  designed  a  number  of  experiments,  many  of  them  still  incomplete,  examining  various  of 
its  {mrameters.  (hie,  sufficiently  complete  for  inclusion  in  this  report  (Section  5),  compared 
three  different  modes:  estimating  probabilities  on  a  device  displaying  a  linear  scale  of  proba- 
Ulities,  estimating  odds  verbally,  and  estimating  odfs  on  a  device  displaying  a  logarithmic 
scale  odds.  It  had  seemed  possible  that  one  reason  for  ccsiservative  performance  was  that 
the  probability  scale  is  bounded  at  zero  and  one,  and  subjects  are  consequently  reluctant  to 
come  too  close  to  the  boundaries  at  which  they  have  no  more  freetkim  to  move.  However,  Ihe 
^periment  on  response  modes  itaiicates  clearly  that  this  factor,  while  relevant,  is  not  the 
{mlmary  cause  of  conservatism.  The  two  odds  groufs  show  less  conservatism  than  the  prob¬ 
ability  groip  but  still  plenty  of  it;  the  ic^arithmic  scale  seems  to  produce  very  less 

conservatism  than  the  direct  verbal  reporting  of  odds.  Research  of  tins  Hsid  continues. 

The  fundamental  finding  of  the  first  study  has  required  no  qua Ul '.cation  or  modification  as’  a 
result  of  Its  amplification  by  these  further  experiments.  The  basic  blss  seems  io  be  strong  sikI 
very  nearly  universal  (at  least  among  college  shxients),  although  of  -urse  the  n»agnitude  of  the 
effect  is  Influenced  by  a  variety  of  such  peri^erai  factors  as  response  mcxies,  presence  or 
absence  of  payoffs,  complexity  of  the  task,  amount  of  trainii^  received,  presenc*  or  absence 
of  fee<&>ack  concerning  the  correct  hypothesis,  etc.  Ai«l  it  is  appre^nate  to  ask  what  elfeets 
this  consistent  bias  in  human  behavior  might  have  on  such  practical  problems  as  the  ttesign  of 
ii^ormation  processing  systems. 

Existing  systems  intended  for  processing  information  in  decisio  i  making,  such  as  command 
and  control  systems,  may  be  extremely  sophisticated  in  their  information  lathering,  display,  and 
communications.  But  their  technique  for  {s-ocessing  the  information  obtained  is  IdeaKcal  with 
that  used  by  Alexander  the  Great:  display  it  to  the  commamier  and  let  him  decide.  Clearly,  any 
Was  that  the  commander  may  bring  to  his  process  of  (fecidir^g  will  be  a  W.a*  in  the  operation  of 
the  system.  It  seems  very  l^Jceiy,  In  view  of  the  research  find'tigs  and  also  on  intuitive  grourds, 
that  commanders  have  a  conservative  bias  in  such  systems;  that  is,  that  they  are  unable  to  ex¬ 
tract  ail  the  certainty  from  the  data  that  the  data  would  justify.  Therefore  one  problem  in  the 
design  of  information -processing  and  decision-making  systems  may  be  defined  as  the  problem 
of  how  to  prevent  the  natural  conservatism  of  humam  informatiw  protiessaigfrom  making  such 
systems  less  responsive  than  they  shcnild  be. 

One  step  toward  the  solution  of  that  problem  cc^sists  o£  analyzing  inf ormatlim -processing 
into  subtasks.  It  seems  clear  that  at  least  two  such  subtasks  can  b.  discriminated.  One  consists 
ci  assessing  the  impact  of  a  single  item  of  information  on  some  hypothesis,  or  set  of  hypotheses, 
of  interest  to  the  system.  The  other  cwisists  of  aggregating  these  ^cts  over  data  and  over 
hypotheses  into  a  pc  tore  of  the  current  status  of  the  hypothssos  Ti-e  first  of  these  tasks,  for 
the  {dnds  of  qualitative  information  that  are  characterlsticaily  ai'ailtble  to  infotmation<'processing 
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systems,  must  inevitably  be  f-ei  '^ed  by  ^5uraan  beiugii.  stis  aeca^ias  tpesse  >it.bc  }s 

naturally  performed  by  Bayes's  .-»=■*  cQ3.ft^^ce;:tlv  is  easy  to  msehanige. 

The  ef.seuce  of  the  proposal  foi  the  desjgn  of  3  proijstlBsUc  is'ifQrs3ation"pfc?t;§ssing  sypteo? 
that  has  emerged  ^ront  the  research  of  ihi.«  project  is  that  huniaa  beings*  should  esnssts;  the 
probability  of  ench  dafeja  givto  each  b¥p(>ihe5is KDlfD-^-for  some  closely  .related  quantity, 
such  ats  a  set  of  ukeliiioud  eat^oi;,  md  >  machtne  sfeouid  be  used  t»'  a^regaie  thetK,-  jstt- 
Into  a  posterior  distrtbatlQn  over  a.c  hy-pf-»tiseses  o;  /  ..tgrest  to  the  system. 

ff  human  estliaators  are  naTurallv  less  conservative  in  eatinsatiug  r.^ftoa  than  K 

esitiascir  •iir'terior  prebaWiitieg,  or  can  be  taught  to  be  so,  then  this  proc^ure  sh--'‘*ia 
wibs  the  problem  of  ctmservsisye  tnas.  In  any  cast,  u  seems  attracave  on  ot'.er  ^rmaida:  U 
per  jslte  JragiBentaGoji  of  the  task  of  imormation-processtng  into  many  sabiaak'#  that  can  be 
pt'.’celed  out  among  men  and  maciunes  to  a  ojanner  respeeUng  dse  capabilities  of  each,  Mofe- 
.iver.  It  permits  full  mechaniraticsj  of  what  Is,  from  the  humas  poi’-.f  view,  hasically  a  book' 
iee^ag  task;  the  aggregation  of  data  into  posterior  distrlbutioBS 

The  res'  'rch  j^oblems  of  specifying  and  evaiuatieg  this  idea  are  maaerous  and  very  dli- 
ficuit.  One  problem  asks  how  men  can  be  selected,  traimsd,  and  provided  with  suitabl-^  displays 
and  controls  so  thrt  they  can  work  effsetiveiy  as  esSmaiors  tor  or  a  related  quaciity. 

Another,  which  as'  -  -mes  tost  tfalafid  men  ssr  provide  suitable  estiisaies,  asks  h  <•  a  system  can 
be  designed  to  exploit  that  fact.  Only  U»e  first  of  these  two  gtjestlCBis  has  been  of  ^imary  in¬ 
terest  in  the  research  program  of  this  contract;  the  other  is  Sie  tmeiness  of  Costoract 
AF  19{628)-2i23. 

Simple,  chort-term.  imscpenslve  laboratory  experiments  ars  incapable  of  studying  preo- 
abiUty  esSisatiar.  in  realty  complex  situations  utkier  full  ^perim«ital  control.  Either  the 
expertise  of  the  subiects  and  the  context  in  which  they  estimate  probabilities  must  be  artiftciaiiy 
created  In  the  laboratory,  in  which  ease  the  expertise  cannot  very  dwp  nor  the  conimit 
co.tsplieated,  or  else  contexts  and  abilities  pre-existing  In  the  real  world  must  be  stitdied.  The 
former  proc^ure  obviously  falls  short  of  examining  performance  under  realistically  complex 
ctroimstasces,  rhe  latter  procedure  sounds  more  attractive.  Unformnately,  i  t  is  almost 
impossible  to  (tetermlne  the  "correct"  pn^abilities  in  real-world  contexts,  ft  is  also  nearly 
impost ible  eo  insure  that  different  subiects  have  con^mrable  ammmts  oC  iMonsation  alKmt  the 
real-world  contexts  chosen  for  study.  So  tl^  resttlts  oMained  from  the  use  of  real-world  ct»- 
texts  and  abilities  wmild  be  hard  to  interpret,  e^ctaiiy  tl  Um  questitm  aslted  is  tow  "correct" 
these  estimiftes  are. 

One  fn-oduct  of  the  research  program  of  the  conu  act  is  «  p'oposed  soluticm  to  these  dif¬ 
ficulties.  The  proposed  solution  is  expensive,  time-consuming,  and  dtfficuit,  but  it  may  work. 

It  is  to  synthesise  a  purtiaily  artificial  world,  of  the  greatest  complexity  ctmsistent  wltti  tracta- 
btlity,  that  has  well-specified  proboMlities  txiilt  into  It.  Once  this  (mmpiex  artificial  worU  has 
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been  it  is  necessary  to  subjects  to  ue  expert  atxjut  it — a  long  process.  A/tos  that 

has  been  done,  tiiey  can  be  exposed  to  ttrfQrmati^“iS=oc;e8slng  tastes  a^^sroia-tate  to  that  artificial 
and  asked  to  estimate  ^’DiS)  or  similar  quajitiug'  for  settabte  data  arrf  hyi^ctheees.  liotli 
toelr  esUmatee  airf  the  performance  of  the  p»  oesblUadc  informatlaM-processing  systeoi  that 
uses  them  can  be  evaluated  by  eoHigaa^isoo  wito  Uie  "trse”  p.-0bab*UUes  Nilt  Into  the  world  to 
atari  with,  ai^  i^rhaps  also  by  com^riaon  with  th®  psyformastce  J  ssaproimbliiatic  systems 

,  iii?  ssnjfe 

Tbo  aemisli»uiali05  fssearch  planned  under  this  coatract  wtU  do  Just  toat. 
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COJ^ERVATISM  IN  COMPLEX  PSOBABiUTlf  INFERENCE* 


This  experimei^  examines  the  rei^ionship  between  estimated  probabilities  and  probabilities 
calculated  isv  means  of  Bayes's  thecrem;  it  compares  human  performance  with  optimal  per¬ 
formance  in  the  task  <tf  revisii^  opir.ion  in  the  light  of  new  information. 

To  provide  setting  aftJ  vocabulary,  a  number  of  contemporary  ideas  concerning  probability 
must  be  briefly  summarized.  The  numbers  called  probabilities  are  formally  well-defined  by 
the  assertiiiss  that  they  are  numbers  between  zero  and  one,  and  that  over  a  mutually  exclusive 
aisl  exhaustive  set  of  hypotheses  (hereafter  called  a  partition)  they  must  atW  to  one.  Bm  three 
fandamentaily  differert  operations  have  been  pioposed  to  relate  those  mimbers  to  observable 
events  in  the  real  world.  The  currerrtly  dcminam  frequeitfistlc  view  defines  a  probability  as 
the  limit  <rf  the  relative  frequency  with  which  a  panicular  pheimmenon  occurs;  probabUtties 
can  be  estimated,  for  example,  by  operations  tike  tossii^  a  coin  many  times  under  '‘sub^amially 
equivalent"  conditions,  and  then  using  the  ratio  of  heads  to  tckal  tosses  as  an  estimate  d  the 
probability  of  heads.  The  symmetri^ic  view  aj^als  to  observable  symmetries  to  make- 
plausible  the  nmion  of  a  collection  of  equally  likely  elementary  evems;  the  faces  oi  a  die  are 
considered  equally  likely  to  come  up  because  the  die  is  symmetrical.  The  personalistic  view 
d^ines  a  prot^Utty  as  an  orderly  and  coherem  Judgment  made  by  a  rational  person  who  brings 
to  bear  upon  the  immediate  question  his  relevaM  past  esqjerience,  oi  whatever  kind.  Prob¬ 
abilities  so  defii^  are  called  personal  probabilities,  ami  describe  the  person  Judging  the 
everS  as  well  as  the  evem  itself, 

Correspomiing  to  these  three  philosophical  posttions  abtnit  the  fouittuKtons  of  probabilfty 
are  three  quite  differed  ways  d  dispiayii^  probabilities.  Symmetry  disfrtays  are  common  and 
effective;  examples  are  cards,  dice,  roulette  wheels,  and  the  like.  Frequency  disf^ays  are 
very  rare,  mostly  because  frequencies  are  usually  based  on  couitfs  of  ramfeim  samples,  and  the 
notion  of  ramtomness  is  usually  defined  by  an  appeal  to  symmetry,  ftd  witsd  miglg  be  c^led 
plausibility  displays  are  most  common  of  all.  We  make  istuttive  nonimmerical  judgmoata  d 
probability  at  every  moment  of  our  lives,  and  any  iMornudion  display  th^  iitfluences  such 
judgmetas  without  necessarily  appeaiirig  ts  symmetry  or  relative  frequency  (or  auUhemj^ical 
necesshy)  may  be  called  a  plausibility  display. 

Philip  fl],  Stevens  and  Gaiaider  [2],  and  Shuford  f3]  have  found  tiud  simultaneou^y  disf^ayed 
relative  fr^uencies  can  be  quite  accuradely  estimated  on  the  basts  of  e3£posures  too  slKirt  to 
permit  couiding,  and  Robinson  [4]  found  the  same  thing  for  sequemially  disi^ayed  relative 
frequencies.  Teichner  [5 1,  in  a  more  complex  task  using  a  frequency  dii^ilay,  chained  some¬ 
what  less  acearate  performance. 

*This  section  was  prepared  by  Lawrence  D.  F^iilips,  WUtiain  L.  Kays,  and  Ward  Inwards. 
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A  large  mtniber  erf  ejqperimeias  have  attempted  to  litfer  Jvaiged  probabilfiies  from  observed 
acceptances  and  rejections  ctf  bets,  assuming  that  subjects  based  their  decisions  on  some  ver¬ 
sion  of  the  well'kiMwn  Subjective  Ejected  Utility  {SEU)  moctel.  (For  reviews  of  this  liter^ure 
and  oi  tte  mode!,  see  Edwards  [6,  7].)  Such  studies  typically  use  symmetry  displays  of  prob- 
abilhy  such  as  are  provided  by  dice,  spinners,  airf  the  like.  In  effect,  then,  probabilities 
iiderred  from  decisions  via  the  SEU  model  are  compared  with  proteibilities  displayed  directly 
by  means  <rf  symmetry  displays.  The  large  system^ic  differences  thi«t  are  aimo^  always 
fouiai  in  such  studies  imply  either  that  symmetry  displays  produce  severe  distortions  of  judged 
prcrttabilities  (compared  with  the  generally  accurate  jwi^ems  erf  relalve  frequencies,  for 
example)  or  else,  more  plausibly,  that  tte  SEU  mode!  is  descriptively  inadequate  and  so  is  an 
ina^ropriate  ^jasis  for  tirference  erf  judged  probabilities.  ^  methodologicid  and  formal 
difficulties  dominate  this  literature  and  few  firm  conclusions  are  possible. 

Some  of  the  descriptive  in^iequacies  of  the  SEU  model  can  be  allevmed  by  using  a  con¬ 
ception  (rf  preasabiiity  that  not  require  the  sum  of  the  probabilities  erf  a  mirfually  exclusive 
and  exhau^lve  set  <rf  events  to  be  one,  or  any  other  constant.  Whether  such  numbers  deserve 
to  be  called  protMbillties  ccwld  be  argued,  lait  tliey  can  be  so  considered,  and  a  iK>nadditive 
SEU  model  is  not  internally  contradictory  (Edwards  [8]),  although  in  such  a  model,  utilities 
must  be  measured  on  a  ratio,  rather  than  an  imerval,  scale.  Such  possiUy  iwnadditive  prob- 
abilities  iirferred  from  the  choices  of  real  people  migM  well  be  called  subjective  probabilities, 
to  dii^ii^uish  them  from  the  peraona!  probabilities  that  might  be  iirferred  from  the  choices  of 
ideally  consi^etrf  people. 

The  clouds  on  V^rus  either  coirfain  a  lot  of  water  vapor  or  they  <k>  not.  For  a  frftjueirfist, 
the  probabUi:y  of  th^  proposition  is  therefore  either  one  or  zero.  If  it  is  defined  ^  all.  A  per- 
sonalist,  i»wever,  prefers  to  express  his  uncertaiirfy  aboia  tl»  clouds  on  Venus  (aiKi  indeed 
about  any  topic)  as  explicitly  as  possible,  and  uses  probabiltties  to  do  so.  He  consequeirfly 
considers  the  probability  that  the  hyperfhesis  is  true — a  notion  meanii^less  to  freqaestists. 

Bayes's  theorem,  an  elemeirfary  ami  noncoirfrcversial  consmiuence  of  the  definition  of 
condrfionai  probability  ami  erf  the  requtremem  that  pro^biltties  must  add  up  to  one  over  a 
mutualiy  exclusive  ai^  eidsaustive  set  of  eveirfs,  has  some  usefulness  for  frequentists.  For 
personalists,  .however,  It  plays  a  crucial  role;  it  is  the  formally  af^ropriate  rule  specifyii^ 
how  the  pKibabiiity  that  a  hypothesis  is  true  should  be  revise  in  accord  with  new  data.  K  is 
therefore  an  ofrf'mai  model  for  revision  of  opinion  in  the  llglrf  of  irrformation^ — th^  is,  for 
iirformation  processing. 

Bayes's  theorem  can  be  eiqiressed  as  follows: 

P(h{D)  .  kP(DIH)P(H)  (1) 
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P(H|D)  Is  the  prob^iltty  assigiwd  to  hypothesis  H,  given  kncwleclge  of  the  datum  D;  P<H)  is 
the  probability  assigned  to  H  before  D  was  known;  and  P<D|H)  is  the  probability  of  getting  data 
if  H  is  true.  The  normalizit^  constaitf,  k,  ensures  that 


m 

£p(h,1d) 

i=l 


t  over  the  m  elemeiks  of  the  partition 


tt  is  easy  to  show  that 


m 


l/k  =  P(D)  =2^P<D|Hj)P(Hj) 


i=l 


P{HlD)  is  called  the  posterior  probability  of  H,  and  p(H)  is  the  prior  probability:  P{Dj  H)  is 
called  the  likelihood  (of  datum  O  on  hypmhesis  H).  Umter  circumstances  such  as  those  pre¬ 
vailing  in  our  experiment,  tte  likeliimod  of  several  6s^  is  simply  the  product  oi  the  individual 
likelihoods.  Formally,  this  simple  rule  is  appropriate  only  if  the  data  are  conditionally  in- 
depemient  of  one  ancUher  given  each  the  hypotheses;  for  a  discussion  ai  the  difficult  topic  ot 
conditional  Independeiwe,  see  Edwards,  Lindman,  and  Savage  [9]. 

Thus,  Bayes's  theorem  says  th^  the  probability  assigMd  to  a  hypothesis  after  observing 
the  datum  (or  d^a)  D  is  directly  proportional  to  the  probabiltty  assigned  to  the  hypotlwsis  be¬ 
fore  observing  the  datum  multiplied  bj  the  likeliiK>od  ttw  datum.. 


These  ejqwriineiits  compared  the  posterior  proi»Ul)iltty  estimates  ot  several  sublets  wkh 
the  probabilities  calculated  means  d  Bayes's  theorem  and  investigided  several  variables 
that  affect  posterior  climates,  ^ibiects  were  told  thid  the  artificial  envisxmment  for  this 
e3q}erimem  cmdd  be  in  exactly  orw  d  four  ^ates,  referred  to  as  hypotl^ses,  and  that  they  would 
observe  dida  generided  by  only  one  d  these  hypedheses.  &jbject8  were  shown  the  values  d  the 
individual  likelihoods,  tlud  is,  IKd{H}  for  each  possitde  didum,  and  were  given  the  prior  prob¬ 
abilities  assigned  to  the  hypotheses  before  observing  any  data.  Tlmt,  subjects  were  asiMd  to 
revise  their  (pinions  abotd  which  hypothesis  was  true  after  each  new  datum.  However,  subjects 
were  n<d  allowed  to  make  any  computations.  They  were  r^uired  simidy  to  make  itdudive 
climates  d  the  po^erior  probabilities.  Since  the  only  conidraint  placed  on  s^jects  was  that 
their  e^imates  be  between  zero  and  one,  the  po^erlor  e^imates  can  be  considered  ^i^ective 
probabilities.  Personal  probabilities  were  calculated  from  Bayes's  theorem  using  the  given 
prior  probabilities  ai^  litelitoods. 


2.1.  E  .^ERIMENTONE 
2.1.1.  METHC® 

Pfoe^ure.  Each  subject  was  seiUed  ai  a  console  and  asked  to  imagine  himself 
at  the  outpit  d  a  large,  computerized  radar  system.  Subjects  were  told  that  the  environment 
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in  which  this  syirtem  operated  was  in  one  of  four  states,  eneioy  attack,  frteirfly  activity, 
meteor  shower,  or  enemy  attemix  to  spoof  the  surveiJiance  system.  The  system  detected 
aerial  activity  and  computed  the  predicted  poirts  of  Impact  of  the  objects  detected.  These 
poittfs,  the  data  of  the  experiment,  were  displayed  to  the  subject  on  a  rep.*esentation  of  a  circu¬ 
lar  land  mass  that  had  been  divided  itho  twelve  sectors.  (A  sample  display  H  shown  in  Fig,  1.) 
Impact  potstts  always  appeared  within  the  sectors,  never  on  the  sector  bfjrders.  These  displays 
were  projected  from  a  35- mm  slide  projector  onto  the  rear  of  a  rectangular  viewing  screen, 

12  by  8  inches,  located  on  the  console  slightly  above  eye  level  when  the  subject  was  seated. 

After  each  display  the  subject  estimated  the  posterior  probabilities  that  the  system  was 
d>necting  each  of  the  four  kinds  of  activity.  Their  estimates  were  made  by  setting  four  ievers 
mouiked  at  five- inch  intervals  on  the  sloping  frotk  panel.  Each  lever  had  a  12-inch  travel  with 
the  0  setting  nearest  the  subject,  the  1.0  setting  furthest  from  the  subject,  and  calibration 
marks  every  0.05. 

To  help  him  make  these  estimates,  the  subject  was  given  the  prior  proi»bility  for  the 
Enemy  hyp(%hesis  and  ail  possible  values  of  P(DiH).  The  prior  probability  was  displayed  above 
the  unused  response  levers.  The  displays  of  P<D|H)  for  each  of  the  four  hypotheses  were  located 
above  the  middle  four  response  levers.  This  row  of  displays  and  tlw  response  levers  are 
shown  in  Fig.  2  {the  two  outside  levers  were  unused).  The  probabilities  shown  are  the  ones  used 
in  this  experiment. 

The  subject  was  toid  that  the  display  of  P{DIH)  for  a  particular  hypothesis  represented 
the  probabilities  t.hat  the  impact  poitds  would  fall  in  the  corresponding  sectors  if  in  fact  that 
kind  of  activity  was  o':currir«g.  He  received  so  instructions  aboi^  how  to  use  these  numbers, 
except  the  obvious  quaiiti.'tive  siatemetks,  and  was  not  told  that  the  likelitood  of  several  dots 
is  equal  to  the  product  of  tne  likelihoods  for  the  i;Kiividual  d<ks. 

After  making  his  estimates  for  a  display,  ‘he  subject  pressed  a  ^ton  that  infracted  the 
machine  to  record  them;  <  tter  that,  he  reset  his  levers  to  zero  before  seeing  the  next  display. 

No  constraint  was  placed  on  the  sum  of  the  posterior  probability  sitings;  subj^'cts  who  asked 
were  told  the  sum  was  up  to  them.  Subjects  gained  familiarity  with  the  apparatus  during  the 
ir.struction  session  and  during  the  subsequem  trlai  run.  Subjects  were  never  told  anything 
about  the  quality  of  their  estimates.  (Complete  ins’tructions  are  in  Appendix  A.) 

2. 1. 1.2.  ^imuli.  Each  subject  was  presented  with  32  ordered  sequences  oi  15  Kimulus 
slides  each,  and  with  32  scrambled  sequences  constructed  from  the  ordered  sequences.  The 
first  slide  in  an  ordered  sequence  conta;«ved  only  one  dot  (impact  poirt ),  the  second  showed  ihe 
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first  dot  plus  a  new  one,  the  third  slide  contained  the  first  two  dots  plus  a  new  one,  and  so  forth 
for  the  remaining  slides.  Each  ordered  sequence  was  designated  with  a  two-digit  number. 

The  scrantbled  sequences  were  constructed  by  mixing  two  ordered  sequences  together  and 
drawing  at  random  withotg  replacement  two  new  sequences  of  fifteen  slide.’?  from  the  total  set 
of  thirty  slides.  The  scrambled  sequences,  then,  showed  no  orderly  accumulation  or  progres¬ 
sion  of  dots.  Each  sequence,  ordered  or  scrambled,  had  one  of  three  Enemy  prior  probabilities 
associated  with  It  — 10%,  25%,  or  67%.  Plots  of  the  theoretical  (Bayesian)  posterior  probabilities 
for  three  representative  sequences  are  shown  in  Fig,  3.  Bayesian  probabilities  were  computed 
using  the  Enemy  prior  probability,  which  was  given,  and  assuming  that  the  remaining  prior 
probability  was  distributed  equally  over  the  other  three  hypotheses. 

For  any  given  ordered  sequence,  the  dots  fell  irto  exactly  three  ^f  the  12  sectors,  but  the 
three  sectors  used  were  not  necessarily  the  same  from  sequence  to  sequence. 

To  summarize,  three  variables  were  investigated:  aouMitg  of  informatk>n  (number  of  dc^s) 
prior  probabilities  and  order  of  preserg^ion  of  information  (ordered  vs.  scrambled). 

Subjects  participated  in  two- hour  sessions,  during  which  six  to  etgtt  sequences  were  usually 
completed,  u-.i4il  all  64  sequences  had  been  shown.  The  total  time  a  subject  needed  to  comirtete 
all  sequences  varied  from  14  to  25  hours.  The  order  of  preseMation  of  the  Mimulus  sequences 
was  partially  counter'oalanced  over  the  five  subjects  by  use  of  a  lattice  design  (Cochran  and  Cox 

fioj). 


2. 1,1.3,  Subjects.  Five  volunteers,  male  University  Michigan  freshman  engineering 
students,  were  paid  at  the  rate  of  $1.25  per  hour.  This  population  was  chosen  to  insure  familiar¬ 
ity  with  quantitidive  reasoning  and  ignorance  of  Bayes’s  theorem, 

2.1.2.  RESULTS.  It  Is  convenient  to  discuss  several  minor  findings  first,  simie  they  per¬ 
mit  great  simplification  of  analyses  of  the  major  finding. 

2. 1.2.1.  Sum  of  Posterior  Settings.  One  subject  spontaneously  atteaypted  to  fxnattsdn  tl» 
sum  of  his  posterior  settings.  Another  asked  if  his  settit^s  shmild  sum  to  one,  but  when  told 
that  he  could  do  as  he  liked,  did  not  normalize  his  settings.  Tne  remaining  three  subjects  did 
not  normalize  their  settings.  For  these  latter  four  subjects,  Uie  sums  trf  their  po^erior  ps^jb- 
abiiity  settings  Increased  with  the  number  of  .^Imulus  dots.  Mone  oi  the  c4her  variables  had 
any  consisted  influence  on  this  sum.  IiSrospection  aiul  inquiry  au^;e^  that  the  main  reason 
,<or  this  is  that  subjects  are  much  more  willing  to  increase  an  estimate  for  a  diagnosis  favored 
by  a  new  item  of  evidence  tha.i  to  decrease  estimates  for  the  diagnoses  ni^  favored  by  that  item. 
Plots  of  the  sums  for  each  subject  are  shown  in  Fig.  4.  These  sums  are  averages  over  ali  64 
stimulus  sequences. 
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2. 1.2. 2.  Analysis  of  Variance  of  Deviations.  Squared  deviations  of  subjects’  posterior 
estimates  of  enemy  attack  from  t.he  theoretical  Bayesian  values  were  computed.  The  mean 
over  the  number  of  dots  of  the  squared  deviations  was  defined  as  a  measure  of  the  amount  of 
deviation  from  ojrtimal  performance.  Analysis  of  variance  of  this  measure  showed  that  the 
mam  effect  of  prior  probabilities  was  significant  beyosKl  the  0.01  level.  The  order  of  presen¬ 
tation  of  iitformation  showed  absolutely  no  effect;  the  interactions  of  order  with  prior  prob- 
abiilty  and  with  iralividuai  sequences  were  insigntf leant.  For  that  reason,  subsequent  data 
analyses  combine  data  obtained  from  both  ordered  and  scrambled  conditions,  or  else  con¬ 
sider  only  the  ordered  condition. 

Lack  ol  significance  of  the  order  variable  is  surprising.  Apparently,  subjects*  deviations 
from  optimality  are  unaffected  by  the  order  in  which  they  receive  information.  In  this  respect, 
subjects'  behavior  is  like  that  of  Bayes’s  theorem.  In  order  to  compute  posterior  probabilities 
for  any  given  slide,  the  theorem  needs  to  know  only  the  cu.nditional  probabilities  of  observing 
all  the  data  displayed,  and  the  prior  probability  t.hat  oteained  before  the  data  were  observed. 

These  probabilities  can  be  ol^alned  without  knowledge  of  any  other  slides.  We  conclude,  then, 
that  for  this  task,  subjects  are  little  affected  by  the  sequential  nature  of  the  information  in  the 
ordered  sequences;  each  slide  is  treated  as  a  separate  problem. 

To  facilitate  more  meaningful  analyses  of  the  data,  subjects’  posterior  estimates  were 
adjusted  proportionately  so  that  the  sum  over  the  four  hypotheses  was  one.  Analyses  in  the 
remainder  of  this  report  use  only  the  normalized  data. 

2. 1.2. 3.  Deviations  from  Bayes's  Theorem.  Figure  5  shows  representative  plots  of  sub¬ 
jects’  estimates  (after  normalization)  as  a  function  of  the  number  of  stimulus  dots.  These 
estimates  should  be  compared  with  the  Bayesian  probabilities  shown  in  Fig.  4.  The  lack  of  any 
systematic  difference  between  ordered  and  scrambled  presentation  is  evident.  But  the  most 
striking  finding  Is  the  very  small  amount  that  subjects  changed  their  probability  estimates 
from  one  stimaius  to  the  next,  even  when  Bayesian  probabilities  showed  considerable  change. 

In  nearly  every  sequence,  subjects  exhibited  this  conservatism.  Subject  Three  is  the  only 
exception;  he  sometimes  moved  more  trian  Bayes's  theorem.  In  some  cases,  notably  for 
Subject  Four,  the  posterior  estimates  moved  ’'■w'ard  one  another  instead  of  toward  zero  or  one 
as  the  number  of  dots  increased.  This  subject  apparently  became  less  sure  as  the  evidence 
mounted  up.  Even  on  problems  as  easy  as  the  top  one  in  Fig.  4  a.nd  5,  four  of  the  five  subjects 
failed  to  reach  anything  like  the  extreme  posterior  probabilities  that  would  be  appropriate. 

Subject  Three,  the  most  nearly  Bayesian  subject  throughout  the  experiment,  did  better  than  any 
of  the  others,  but  still  not  well.  Though  t.he  details  vary  from  sequence  to  sequence  and  from 
subject  to  subject,  the  finding  is  the  same  for  nearly  ail;  .subjects  failed  to  be  a.s  sure  as  Bayes's 
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FIO'JRE  3.  REPRESENTATIVE  PLOTS  C?F  SUBJECTS  normalized  ESTIMATES,  (a)  Subject  Oi>e. 
Sequence  2.S,  Ordered,  (b)  Subject  One,  Sequence  28,  Scrambied.  (c)  Sibject  Five,  Sequent*  12.  Ordered 
fdi  Subiect  Five.  .Sequence  12.  -Scrambled,  (e)  Subject  Two.  Seq^nce  38.  Ordered,  it)  Subjert  Two. 

Sequenc*  3.8.  Scrsmbled. 
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theor«m  would  permit  them  to  be,  and  stopped  modifying  their  opinions  in  the  hght  of  adauional 
information  while  they  were  still  very  far  from  posterior  probabilities  of  one  and  zero. 

2. 1.2.4.  ScatterplcKS.  To  determine  whether  this  conservatism  was  consistent,  scatter- 
plots  were  constructed  of  the  normalized  posterior  probabilities  estimated  by  eac.h  subject  as 
a  function  of  Bayesian  posterior  probabilities  for  the  ordered  presentations.  In  Fig.  6  one  set 
of  scatterplots  is  shown  for  Subject  One,  who  was  neither  the  most  Bayesian  nor  least  Bayesian 
subject.  Two  variables  have  been  retained.  One  is  the  number  of  stimulus  dots.  This  variable 
is  represented  at  a  different  value  in  each  row.  The  first  row  is  for  one  dot,  t.he  second  for 
three,  the  third  for  sia,  and  the  fourth  for  ni.ne.  Because  Bayesian  probabilities  (though  not 
subjects’  estimates)  generally  go  to  zero  or  one  for  more  than  nine  dots,  no  further  plots  were 
made.  The  other  variable  is  prior  probability  of  the  Enemy  hypothesis.  The  first  column  is 
for  all  sequences  with  a  prior  probability  of  0.10,  the  second  for  0.25,  and  the  third  for  0.67. 
Estimates  for  the  individual  hypotheses  have  not  been  distingiiished  on  these  plots  because 
more  detailed  analysis  showed  nothing  systematicallv  meaningful,  except  for  the  occasional 
underestimation  of  the  Enemy  hypcAhesis. 

Subject  One  showed  remarkably  Bayesian  performance  for  one  dot.  He  seems  to  have  used 
the  prior  probabilitief-  effectively  and  to  have  been  able  to  modify  them  properly  on  the  basis  of 
the  first  dot.  But  he  became  progressively  less  Bayesian  as  he  obtained  more  information. 

His  deviations  from  Bayes's  theorem,  however,  were  I'elatively  consistent.  He  appears  increas¬ 
ingly  to  have  underestimated  high  posterior  probabilities  and  overestimated  low  ones,  until  by 
the  ninth  slide  the  best  fitting  lines  through  his  scatterplots  would  be  almost  horizontal. 

Subject  Three  does  not  show  such  consistency.  He  initially  underestimated  the  low  posterior 
probabilities  and  overestimated  the  high  probabilities.  However,  in  general,  the  best  fitting 
lines  through  his  scatter  plots  would  be  nearly  45®  lines.  The  other  subjects  showed  varying 
degrees  of  consistency.  The  underestimation-overestimation  tendencies  of  these  subjects 
varied  with  the  number  of  dots  anti  were  often  confounded  with  prior  probability. 

These  plots  clearly  show  no  single  function  relating  their  posterior  estimates  to  Bayesian 
posterior  probabilities  for  all  -subjects.  Some  subjects  are  more  Bayesian  .near  the  beginning 
of  the  sequence,  others  .nearer  the  end,  this  depe.nds,  in  part,  on  the  prior  probabil..y  of  the 
sequence.  The  variable  that  has  t.he  most  pronounced  effect  on  the  relationship  between  posterior 
estimates  and  Bayes's  probabilities  is  the  number  of  stimulus  dots. 

2. 1.2. 5.  A  Performa.nce  Index.  In  order  to  show,  on  only  one  plot,  the  total  performance 

of  an  Individual  .subject,  we  devised  a  Performance  index  (PI).  Squared  aeviations  from  Bayes's 
theorem  are  misleading  indices  of  j'erformance.  If  a  very  conservative  subject  simply  set  the 
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FIGURE  6.  SCATTERPLOTS  OF  BAYES’S  POSTERIOR  PROBABILITY  AS  A  FUNCTION  OF 
POSTERIOR  ESTIMATES  BY  SUBJECT  ONE.  Each  row  represents  a  dIffereiA  value  of  the 
number  of  stimulus  dot.«  The  three  plots  in  each  row  are  for  those  ordered  sequences  with 
enemy  prior  proiMbility  of  10%,  2S%.  and  67  I 
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posterior  estimation  levers  at  0.25  regardless  of  the  sti.*n=  lus,  his  squared  deviations  would 
be  lower  for  the  more  ambiguous,  and  thus  presumably  more  difficult,  sequences  suc.h  as  38. 

And  his  squared  deviations  would  be  higher  for  the  less  amlnguoas,  easier,  sequences  such  as 
28.  While  it  is  obvious  that  this  subject’s  performance  is  more  like  Bayes’s  theorem  for  the 
more  ambiguous  sequences  than  for  the  less  ambiguous  ones,  it  would  be  misleading  to  conclude 
that  the  quality  of  the  subject's  performance  is  different  when  he  deals  wdth  the  ambiguous  ones 
than  when  tie  deals  with  the  unambiguous. 

Thus,  a  good  Performance  Index  should  have  the  properties  of  indicating  very  non- Bayesj,an 
performance  and  remaining  constant  with  varying  number  of  dots  whenever  subjects  leave 
their  levers  at  0.25.  should  also  indicate  perfect  Bayesian  performance  and  remain  constant 
whenever  subjects  make  estimates  identical  to  numbers  calculated  according  to  Bayes's 
theorem.  A  latio  of  squared  deviation  scores  will  exhibit  these  properties: 

V|v  (HjDi  -  P  IH.  :D)]^ 

4_j  n  i  n  i  ^ 

P!^  =  J - - - -  X  100 

7^1 0.25  -  P^lH.lD)]^ 
i 

where  ID)  is  the  normalized  posterior  probability  of  hypothesis  H,  estimated  by  a  given 
subject  for  a  given  number  of  dots,  and  iD)  is  the  posterior  probability  of  hypothesis 
from  Bayes's  theorem  for  a  given  number  of  dots.  In  words,  this  measure  ?s  defined  as  the 
r^io  of  the  sum  over  the  four  hypotheses  of  the  squared  deviations  of  an  individual  subject's 
posterior  estimates  from  Bayes's  theorem  to  the  sum  over  the  four  hypc^heses  of  the  squared 
deviations  of  0,25  from  Bayes's  theorem. 

If  a  subject  is  perfectly  Bayesian,  his  PI  will  be  zero.  If  he  leaves  his  levers  at  0.25,  his 
PI  will  be  100;  100  is  therefore  a  kind  of  baseline  or  definition  of  absurdly  poor  performance. 

But  if  a  subject  gets  a  s’ore  of  100,  he  did  not  necessarily  have  ail  his  levers  at  0.25;  he  only 
iiKiicaled  settings  that  gave  summed  deviations  precisely  the  same  as  those  set  at  0.25.  One 
difficulty  with  this  measure  is  that  oniy  the  values  0  and  100  are  easily  imerpretabie.  Figure  7 
stows  PI  as  a  function  of  the  number  of  dots  averaged  over  ail  sequences  (ordered  and  scrambiedi 
with  the  same  prior  probability.  Ln  interpreting  these  plots  it  is  .necessary  to  keep  in  mind  one 
particular  property  of  Bayes's  theorem  as  more  and  more  data  are  collected,  t.he  prior  prob- 
ablitty  becomes  more  and  more  irrelevant  to  t.he  value  of  the  posterior  probabiiity.  This  is 
iliustr^ed  in  Fig.  8.  for  Sequence  41.  Enemy  arid  Spoof  probabilities  are  plotted  c  Friendly  .and 
Meteor  probabilities  are  very  low)  for  Enemy  a.nd  Spoof  prior  probabilities  of  0.67  and  0.11, 
respectiveiVs  and  for  0.25  and  0.25,  For  more  than  five  dots,  the  curves  are  reasonably  close 
to  one  another.  This  is  even  more  marked  in  sequences  where  the  probabilities  do  not  cross  tind 
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where  they  quickly  converge  on  0  or  100.  Thus,  for  a  subject  to  be  perfectly  Bayesian,  he 
would  have  to  give  less  weight  to  the  prior  probability  the  more  dots  he  sees.  Failure  to 
correctly  weight  the  prior  information  relative  to  the  observed  data  will  cause  the  PI  to  be 
greater  than  zero. 

The  Performance  Index  can  give  only  a  rough  indication  of  why  subjects  deviate  from 
Bayesian  performance.  Some  subjects  show  PI  scores  that  are  initially  very  low  (close  to 
Bayes's  theorem)  and  then  increase  to  a  constant  value.  Others  start  very  high,  sometimes 
above  lOO,  but  then  decrease  to  a  constant  value.  N'rtice  that  the  constart  value  attained  by 
these  latter  subjects  is  usuailv  lower  than  that  of  the  subjects  who  start  tow. 

A  PI  curve  that  starts  low  and  then  increases  can  bt  explained  as  characteristic  of  per¬ 
formance  which  tends  to  weight  the  prior  i.aformation  too  heavily,  at  iea^  for  n  greater  than 
one.  As  more  and  more  dots  appear,  giving  too  much  weight  to  the  prior  probability  will  re¬ 
sult  tn  a  gradually  increasing  PL  A  curve  thjt  starts  high  and  then  decreases  would  result 
from  performance  that  tends  to  weiglg  the  prior  mform^ion  insufficiently;  as  the  data  accumu¬ 
late,  ignoring  the  prior  probability  becomes  less  and  less  serious,  and  the  PI  decreases.  In 
both  cases,  a  constant  value  is  reached  because,  on  the  average,  the  performance  of  neither  the 
subject  nor  Bayes’s  theorem  changes  very  much  after  abtmt  seven  d<gs.  The  constant  value 
should  be  lower  (better)  for  those  subjects  who  wei^  prior  information  less  heavily  than  ttose 
who  do  the  opposite,  since  the  prior  information  becomes  increasingly  irrelevant  as  data  in¬ 
creases  in  amoum.  This  relative  difference  in  the  constant  values  will  only  be  true,  oi  course, 
if  the  differences  in  estimating  the  conditional  probabilities  are  not  too  great. 

Thus,  Subjects  One  and  Four,  and  to  a  lesser  extern.  Two,  a^»ear  to  weight  prior  information 
too  heavily.  No?e  th^  the  shape  arid  smowhness  of  the  curves  lor  Sibjeci  One  agree  very  well 
with  what  can  predicted  from  his  scatterplots.  Subjects  Three  arid  Five  appear  to  under¬ 
weight  prior  information,  at  least  for  sequences  with  prior  probabilities  of  0.10  and  0.25.  And 
their  co.nstant  values  are  less  ihaii  those  for  Subjects  One,  Two,  and  Four.  The  daa  of  Objects 
One  and  Four  suggest  that  high  prior  probabilities  may  partially  correct  the  tendency  to  under- 
weigtit  prior  informatio.n,  for  t.he  shapes  of  their  curves  for  sequences  wimse  prior  protability 
is  0.67  are  -quite  different  from  chaises  produced  by  the  other  prior  probabilities. 

The  terminal  value  of  the  psrforma.nce  Index  for  &ibject  Three — a  little  less  then  40 — is 
smaller  ihan  thid  for  any  wher  subject,  out  ri«  mxh.  The  central  tendency  of  his  performance 
Is  closer  to  b,=yes‘s  theorem  than  that  n  iriy  wher  subject,  tea  his  estimates  scatter  more 
widely  around  the  .rentral  tendency  tha.n  do  those  of  any  other  subject.  This  observation  highlights 
a  deficiency  of  the  Performance  Index  (and  of  any  «her  error- measure  ^sed  o.n  mean 
squared  error);  it  caniwt  discrimin^e  between  random  and  con«ant  error.  The  errors  fouiKi 
irs  this  experiment  are  mostly  constart  rather  tha.n  rai^m  errors  —  as  is  -.:f*e.n  the  case  when 
pennrmar.ee  is  being  compared  with  some  star^ard  of  perfection. 
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Thus,  the  PI  plots  stow  ttetf  subjects’  devisUions  from  Bayes’s  theorem  can  be  partly  ex¬ 
plained  by  failure  to  weight  the  prior  iirformation  properly.  And  they  confirm  that  actual  level 
of  performance  is  depetoeia  on  subjects,  prior  probabilities,  awi  the  number  dots.  Ho 
further  information  was  gained  by  picking  PI  as  a  function  of  the  number  of  drts  for  all  sequences 
in  the  same  set,  PI  plots  averaged  over  subjects  for  the  iMeraction  between  prior  probability 
and  sets  showed  nMhii^  of  iMerest, 


2.1.3.  DISCUSSION.  The  primary  conclusion  itoicated  by  this  experiment  should  surprise 
no  one;  men  are  suboptimal  processors  of  probabilistic  information.  Several  thirds  about  the 
finding  are  a  little  surprising.  For  one  thing,  the  size  of  the  discrej»ncy  is  large — surprisingly 
large  compared  with  our  expectation.  For  another  thing,  we  have  failed  to  find  any  subjects 
who  consistently  leaped  to  a  conclusion  more  quickly  than  is  Justified  by  the  evidence,  hi  fact, 
most  subjects  simply  refused  to  estimate  an  extremely  large  posterior  probability  at  all.  In 
spite  of  the  fact  that  they  seemed  to  ftto  it  easier  to  Judge  what  diagta>sis  was  favored  by  a  new 
item  of  information  than  to  ju<^e  what  diagtmsis  was  made  less  probable  by  that  item.  Even  in 
college  pqpulations,  some  men  must  have  a  tendency  tojui^  to  ccumlusions;  yet  this  experimem 
has  failed  to  exhibit  any  such  tendency  in  any  subject.  Perhaps  such  tmn  do  not  find  service  as 
paid  subjects  in  psychoic^cal  experiments  sufficiently  attractive  to  volunteer  for  it. 

Underestimation  of  high  prtoabillties  and  overestimation  of  low  ones,  often  reported  in 
decision-ttoory  experiments  Ce.g.,  Mosteller  and  Nogee  [llj,  Preston  and  Baratta  [12]},  are  not 
i.nvariabiy  found  in  this  ejqserimefit.  They  are  largely  abserg  in  one  subject,  d^iendent  on  amtwig 
of  information  for  others,  and  confounded  with  prior  probability  for  all.  Sill,  the  congruence 
between  the  fitoings  of  this  experimei^  and  those  of  experimei^s  concent  wtth  estim^ion  of 
relative  frequencies  (Philip  (Ij,  Stevens  and  Galanter  [2])  or  with  probabilities  inferred  from 
choices  amors  bets  (Griffith  [13 1,  Mosteller  and  N<^ee  [1 1  ])  suggests  an  underlyii^  teiKtency 
toward  conservatism  in  estimation  and  use  of  probabilities  over  a  wiite  class  of  tasks 
least  amor^  college  studems.  (Bitf  for  conflicting  evidence  see  Ikde  [14^  and  for  an  argumene 
that  things  are  more  complicated  than  this,  see  Edwards  [8].) 


Other  factors  that  may  have  influenced  performance  are  display  jarameiers.  A  pretest 
of  several  differetd  methods  of  displayii^  PfDjH)  resulted  in  the  displays  shown  in  Fig.  I. 
Although  rm  subject  complained  about  these  displays,  the  quei^ion  lingers  whettor  they  may  have 
accoutred,  in  part,  for  the  conservative  behavior,  ffoneoftto  displays  shows  a  sector  probablltty 
greater  than  0.25.  Perhaps  the  (necessarily)  low  numbers  on  the  WDIH)  displays  su^ested  to 
the  subjects  that  Uiere  should  r»t  te  too  much  difference  between  their  estimates,  whatever  the 
data. 


Further,  we  assumed  that  it  was  necessary « a  display  only  Enemy  prior  probability  be¬ 
cause  subjects  would  distrltote  ine  remalni-og  probability  equally  among  the  other  three 
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hypotheses.  In  view  ol  the  finding  that  subjects*  estimates  do  .'kH  always  sum  to  one,  it  is 
questionable  whether  they  knew  how  much  probability  was  left  to  be  distributed  among  the  re¬ 
maining  alternatives.  The  consistency  shown  by  some  subjects  on  the  scatterplots  indicates 
that  this  is  probably  rioi  a  serious  problem,  but  prior  probabilities  will  Ite  displayed  for  each 
hypothesis  In  future  experiments. 

Another  methodological  issue  concerns  the  utilities  that  some  subjects  may  have  attached 
to  the  particular  hypotheses.  One  subject  showed  occasional  unde re.stimat ion  of  Enemy  prob- 
abiiittes,  suggesting  that  he  was  especially  conservative  in  making  this  diagnosis.  However, 
the  scatterplots  were  originally  draw-n  so  that  estimatee  for  each  hypothesis  could  be  examined 
separately,  and  the  estimates  made  for  one  hypothesis  very  rarely  showed  any  consiiKei^ 
deviation  from  the  ckher  estimates.  So  this  issue  is  probably  not  very  important  either. 

The  remaining  experiments  reported  in  this  section  examine  two  factors  that  could  have 
contributed  to  the  conservatism  of  the  subjects.  Otie  is  the  artificiality  the  stimulus  display, 
and  the  other  concerns  the  method  of  responding. 

2.2.  E.XPERIME.NT  TVr'O* 

In  E.xperiment  One  the  stimulus  dots  were  constrained  to  appear  in  only  three  of  the  twelve 
sectors  of  the  display.  If  each  sequence  of  15  dots  had  been  randomly  generated  umier  the  truth 
of  e.xactly  one  of  the  hypotheses,  then  the  «Kits  would  have  been  distributed  over  more  tiiaii  three 
sectors.  The  artificial  cowtr^mt  on  tne  distribution  of  dots  produced  sequences  that  looked 
unlike  any  of  the  hypotheses;  that  might  be  why  subjects  estimated  conservatively. 

This  e.xperiment  tests  t.he  hypothesis  that  co.nservaiive  posterior  probability  estimation  in 
the  original  e.xperimeni  was  due,  at  least  in  part,  to  the  artificial  constraint  on  the  dots  variaWe. 
For  the  experiment,  .new  seque.nces  -were  generated,  each  having  posterjer  probabilities  approx¬ 
imately  equal  to  the  posterior  probabilities  -of  a  .sequence  in  :ht  original  study;  however,  the 
dots  were  distributed  over  several  sectors,  to  look  like  a  more  represciaative  sample  than  did 
original  sequences. 

2.2.1.  METHOD 

2.2.1  1.  .Apparatus.  Conditional  probability  displays  and  apparatus  were  the  same-  as  those 
used  in  Experiment  One, 

2.2, 1.2.  Stimuli,  Subjects  were  shown  eight  sequences  of  fifteen  dots  each.  For  ail  se¬ 
quences,  prior  probabilities  were  given  as  0.25.  Four  of  t.'iem  were  sequences  35,  44,  24,  and 


This  experiment  was  ran  by  fUc.hard  Norma.o. 

25 


41  of  ExperimeiK  One,  with  the  dots  appearing  in  only  three  of  the  twelve  sectors.  Four  new 
sequences  were  constructed  in  which  the  dots  were  distributed  over  more  than  three  sectors. 
However,  the  r«w  sequences  had  posterior  pri^abilitiesver’' nearly  identical  to  the  posterior  prob¬ 
abilities  for  the  old  sequences  when  the  posterior  probabilities  for  all  sequences  were  calculated 
irom  Bayes's  theorem  using  ortor  probabilities  of  25^.  By  the  fifteenth  dot,  the  posterior 
probablhtie.  M  e;ery  sequence  are  near  one  or  zero.  Figure  9  shows  the  distributions  of  dots 
for  the  original  sequence  44,  and  for  its  equivalent  new  sequence.  Fig,  10  shows  the  Bayesian 
posterior  probabilities  for  these  sequences. 

2. 2. 1.3.  Procedure.  Each  subject  uas  shown  ail  eight  sequences  in  random  order,  in  two 
session.^  lasting  a  total  of  about  three  hours.  Conditions  were  comparable  to  the  ordered  se¬ 
quence  presentation  of  the  Phillips,  Hays,  and  Edwards  experiment. 

2. 2. 1.4.  Subjects.  Four  men.  University  of  Michigan  undergraduates,  served  as  subjects. 

They  were  paid  SI  5  per  hou*. 

2.2,2.  RESULTS.  The  amount  that  subjects  revised  their  estimates  from  one  dot  to  the 
next  was  generally  niore  conservative  then  the  revision  of  probability  calculated  from  Bayes's 
ihfcoren.. 

An  analysis  of  variance  was  computed  using  as  the  dependent  variable  the  absolute  devia¬ 
tions  of  subjects'  estimates  from  Bayes's  theorem  for  the  correct  hypothesis.  The  fifteen 
deviations  generated  from  a  single  sequence  by  one  subject  were  treated  as  independent  meas¬ 
ures,  an  assumptio.’’  justified  bv  the  insignif seance  of  the  order-of-presentation  variable  in 
Experiment  One, 

Three  independent  variables  were  examined;  (l  i  Distribution  of  dots,  representative  or 
unrepresentative;  (2)  Sequences;  and  (3)  Subjects.  The  sequences  variable  is,  of  course,  nested 
within  the  dots  variable.  Table  1  shows  ih-.  variance  due  to  subjects  is  highly  significar^. 

Variance  due  to  dots  is  mlidiy significant,  while  the  dots-by- subjects  interactions  is  not  significant. 

TABLE  I.  A.N'ALYSIS  OF  VARIANCE  OF  SUBJECTS’ 

DEVIATIONS  FRO.M  BAYES’S  THEOREM 


Sou  rce 

df 

MS 

F 

Dots  {D) 

1 

1,491.08 

4.87 

Sequences  (Se)  nested  in  L 

6 

532.61 

Subjects  (Ss) 

3 

10,654,42 

34,81** 

D  >  Ss 

3 

395,36 

1.29 

Se  (D)  X  Ss 

18 

1,194.38 

Within  cell 

480 

269.96 

Pooled  error 

504 

306.11 

*P  <  .05 

•*P  <  ,01 
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FIGURE  9.  DISTRIBUTION  OF  STIMULUS  DOTS  FOR  OLD  AND  NEft  SEQUENCES.  Ca)  DlstributiMj 
of  stimulus  dots  for  Old  Sequence  44.  (b)  Distribution  of  stimulus  dots  for  New  Sequen«  44. 


FIGURE  10,  BAYESIAN  POSTERIOR  PROBABILmF:S  FOR  OLD  ANT)  NEW  SEQUENCES.  1*1  Old 

Sequence  44.  (b)  New  Sequence  44. 


21 


2.2.3.  DKCUSSION.  Once  again,  the  primary  finding  is  conservatism.  Every  subject  in 
this  experiment  extracted  less  certainty  from  the  data  than  is  justified  by  the  Bayesian  calcu¬ 
lations. 

The  analysis  of  variance  iraJicates  th^  the  distribution  of  dots  does  make  some  difference, 
though  a  giance  at  the  mean  square  of  the  dots  variable  shows  that  this  variable  is  not  a  large 
source  of  variation.  The  magnitude  of  the  deviation  scores  is  shown  in  Table  !!.  The  total 
deviation  score  for  the  oid  sequences  indicates  that  performance  was  more  Bayesian  for  the 
old  sequences.  Thus,  the  distribution  of  dots  certainly  does  not  expiain  the  conservatism  at 
all;  what  effect  the  dots  variable  does  have  seems  to  operate  in  a  direction  opposite  to  that 
hypcHhesized. 


TABLE  II.  DEVIATIONS  FROM  BAYES’S 
THEOREM  OF  SUBJECTS’  ESTIMATES 
ON  THE  CORRECT  HYPOTHESIS 


Sequences 

Subject  New  Oid 

1  2,28?  1,995 

2  2,428  2,324 

3  2,086  2.!I0 

_ 1,338  864 

Total  8,139  7.293 

Deviations  are  summed  over  the  fifteen 

estimates  per  sequence  and  over  four  se¬ 
quences  of  each  ciass. 


To  satisfy  ourseh  es  that  the  new  sequences  had  posterior  probabilities  nearly  equal  to 
the  correspoiKling  old  seque.nces,  we  computed  an  analysis  of  the  variance  of  the  Bayesian 
posterior  probabilities.  The  results,  shown  in  Table  III,  do  indeed  confirm  ihat  t.he  differences 
between  old  and  new  sequences  are  very  small.  Thus,  interpretation  of  the  first  analysis  of 
variance  is  nrt  contaminated  by  differences  in  old  atrf  new  .sequences. 


TABLE  HI.  ANALYSIS  OF  V.ARI.A.NCE  OF 
^'^TERIOR  PROBABILITffiS  FOR  OLD 
AND  NEW  SEQUENCES.  CALCULATED 
FROM  BAYES’S  THEOREM 


Source 
Dots  (D) 

Sequences  nested  in  D 
Wdl.hin  cell 


df  MS  F 

1  216  01  r;.s. 

6  i,694  26  4.32*» 

112  38812 


28 


-  •  P  <  .01 


We  conclude,  then,  that  conservatism  in  this  task  is  unaffected  by  the  representative 
character  of  the  stimulus  display. 

2.3.  EXPEEIME.N’T  THREE* 

Experiment  One  suggested  that  there  is  a  correlation  between  task  difficulty  and  the  degree 
to  which  subjects  approach  Bayesian  performance  in  proces.sing  probabilistic  information. 

Many  subjects  appeared  to  be  more  Bayesian  when  the  sequences  were  simple,  that  is,  w.hen  the 
data  clearly  pointed  to  only  one  hypothesis  as  the  most  likely  one.  Sequences  appeared  to  be¬ 
come  more  difficult  as  the  information  became  more  ambiguous  and  contradictory.  The  experi¬ 
ment  we  are  now  reporting  studies  this  variable,  using  three  sequences  representing  three 
levels  of  difficulty,  and  one  other  variable. 

The  other  variable  studied  concerns  the  difference  between  the  two  possible  interpretations 
of  Bayes’s  theorem.  The  input  to  the  theorem  can  be  correctly  expressed  in  two  ways.  One 
way  is  to  use  the  conditional  probability  of  n  dots  for  each  hypothesis  with  the  prior  probability 
for  n  =  1.  This  we  will  call  the  nonsequential  version  of  Bayes’s  theorem.  The  other  way  is 
to  use  the  conditional  probabilities  of  only  the  new  dot  at  slide  a  with  prior  probabilities  that 
are  the  posterior  probabilities  from  slide  n  -  i.  This  we  wli!  call  the  sequential  model.  Both 
methods  of  calculation  lead  to  the  same  posterior  probabilities. 

Sim  e  subjects  in  the  original  experiment  were  presented  with  dots  that  accumulat,ed,  and 
were  required  to  reset  t.heir  levers  after  each  set  of  estimates,  the  cards  were  stacked  in  favor 
of  their  adoptirig  a  rwnseqaentiai  mode  of  behavior,  though  not  neces,sarily  Bayesian.  The 
present  e-^gieriment  examines  the  effects  of  presenting  only  one  dot  on  the  viewing  screen  for 
any  value  of  n,  where  n  is  the  total  number  of  dots  shown.  Subjects  were  not  required  to  reset 
their  levers.  In  fact,  they  were  told  to  revise  on  trial  n  +  1  the  settings  tliey  left  at  trial  n.  In 
ckher  words,  they  were  encouraged  to  use  their  posterior  settings  at  trial  n  -  I  as  the  prior 
probabilities  for  trial  n. 

Thus,  the  question  of  interest  is  whether  subjects  are  more  or  less  Bayesian  for  the  se¬ 
quential  mode  than  for  the  noissequentSal  mode. 

2.3.2.  METHOD 

2. 3. 2.1.  Subjects.  SLx  summer  students  were  subjects.  All  were  volurteers  hired  througii 
the  Student  Employment  Office,  and  each  was  paid  $1,25  per  hour.  All  subjects  completed  the 
experiment  in  less  than  two  hours. 


This  materia!  was  prepared  by  Lawrence  D.  Phillips  a.Td  Ward  Edwards. 
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2.3, 2.2.  Apparatus  and  Method.  Apparatus  and  conditional  probabiiity  displays  were  the 
same  as  used  m  Esqperiment  One.  Each  subject  was  presented  first  wt:h  three  sequences  from 
the  original  study*  numbers  12,  28,  and  38  (see  Fig.  3).  Sequence  28  is  relatively  easy  (the 
data  clearly  indicate  orJy  one  hypothesis  as  the  "correct”  one),  12  is  moderately  difficult  (the 
data  point  ambiguously  to  two  hypotheses),  and  38  is  difficult  (the  data  are  ambiguous  about 
all  four  hypotheses).  The  order  in  which  these  sequences  were  presented  was  completely 
couirterbaianced  for  the  sL’c  subjects.  The  prior  probabilities  were  displayed  above  each  of  the 
conditional  probabiiity  displays  ami  remained  in  view  throughout  the  entire  sequence  of  fifteen 
d«s,  Respiinse  levers  had  to  be  reset  after  each  slide. 

Following  these  seqaeaces,  the  subjects  were  preseiaed  with  the  first  three  sequences,  the 
only  differences  being  that  t.he  data  and  the  conditional  probability  displays  were  inverted  and 
reversed  and  dots  did  .not  accumulate.  These  sequences  were  designated  62,  78,  and  88  (add 
50  to  the  original  sequence  number)  and  were  presented  to  each  subject  in  the  same  order  in 
which  the  first  three  were  given.  The  prior  probabilities  were  displayed  on  a  slide  just  prior 
to  the  first  dm.  The  subjects  were  required  to  set  their  levers  accordi.ng  to  the  prior  prob¬ 
abilities  displayed  on  the  first  slide,  and  were  told  to  revise  that  estimate  when  shown  the  first 
dot.  They  were  not  allowed  to  reset  their  levers  to  zero,  arid  were  instructed  to  revise  their 
lever  settings  as  they  received  new  information. 

Normalization  of  posterior  estimates  was  required  under  bmh  presentation  conditions. 

The  cover  story  attemjsed  to  attach  equal  utilities  to  t.he  four  hypotheses. 

Subjects  were  asked  at  the  completion  of  all  sequences  if  they  noticed  any  similarities  be¬ 
tween  the  first  three  sequences  arei  the  latter  three.  No  subject  reported  thai  be  did, 

2.3.3.  RESULTS.  On  Sequences  28  and  78  all  subjects  tended  to  underestimate  trse  high 
probabilities  and  overestimate  the  probabUuies  for  the  other  three  hypotheses.  This  tendency 
is  evident  to  a  lesser  degree  in  Sequences  12  and  62,  but  not  very  apparent  m  Sequences  38  and 
88;  this  is  probably  because  the  Bayesian  posterior  probabilities  are  not  as  e.’Ctreme  for  these 
sequences. 

Performance  Indices  were  computed  for  each  subject  on  each  sequence  for  each  value  of 
n.  An  a.nalysis  of  varia.'ice  on  these  Pi's  gave  the  results  shown  in  Table  IV.  Because  only  one 
observation  apjjeared  in  each  cell,  the  error  term  used  in  the  analysis  was  the  figure  represeuting 
the  mean  squares  of  the  sequences  times  presentation  times  dma  times  subjects  variable. 

In  interpreting  this  analysis  of  variance,  it  is  importa.nl  to  keep  in  mind  that  PI  is  being 
examined,  so  the  experimental  variables  must  be  understood  to  affect  the  degree  to  which  sub¬ 
jects  were  successful  in  approaching  Baye.sian  performa.nce.  Three  of  the  mam  effects  are 
Significant.  Sequences,  for  one:  subjects  are  less  Bayesian  for  the  more  difficult  sequences. 
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The  number  of  dcrts  is  miidiy  significant,  the  trend  being  towards  tne  less  Bayesian  performance 
as  the  number  increases,  though  there  are  some  exceptions  for  some  subjects  at  some  sequences. 
For  Sequences  28  and  78  ali  subjects  became  less  Bayesian  with  more  d^s:  for  the  aher  two 
sequences  the  PI  peaks  sharply  and  rather  irregulariy,  but  there  is  enough  consistency  among 
subjects  to  give  a  sequences  x  dots  interaction, 

TABLE  rV'.  SUMB4ARY  OF  ANALYSK  OF  VARIANCE 


Source 

df 

MS 

F 

Sequences 

2 

104,525.5 

43.79*  ♦ 

Presentation 
{sequential  or 
nonsequential) 

1 

5,314.0 

2.23t 

Number  of  dots 

14 

5.111.1 

2.14* 

Subjects 

5 

17,870.4 

7.49** 

Sequences 

X  presentation 

2 

2,554.0 

L07T 

Sequences 

X  dots 

28 

9,729-1 

4-08** 

Sequences 

X  subjects 

10 

13,835.1 

5.80*  • 

Presentation 

X  dots 

14 

1,457.3 

-t 

Presentation 

X  subjects 

5 

8,913.2 

3.73*  • 

Dots  X  subjects 

70 

4,167.5 

1.75** 

Sequences  x  pre¬ 
sentation  X  dots 

28 

2,017.3 

- 1 

Sequences  x  presen¬ 
tation  X  subjects 

10 

9,048.8 

3.79** 

Sequences  x  dots 

X  subjects 

HO 

3,334  3 

1.40* 

Presentation  x  dots 

X  subjects 

70 

2,107.1 

-1 

Sequences  x  presenta¬ 
tion  X  ttots  X  Ss 

HO 

2,386  8 

- 

•  •  P  <  .01 
•  P  V  .oa 

t  n.s. 

Indiv  iduai  differences  among  subjects  are  high;  thus,  the  subjects  main  effect  is  significant. 
For  some  subjects  the  method  of  presertation  m^es  a  di^erence.  This  is  not  true  tor  all 
subjerfs,  however,  so  there  is  a  preservation  x  subjects  interaction,  lav  not  a  main  ^fect  rhie 
to  present^ion.  Further,  for  those  to  whom  presentation  conduion  does  make  a  difference, 
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2.3. 2. 2.  Apparatus  and  Method.  Apparatus  and  conditional  probabiluy  dispiays  were  the 
same  as  used  in  Experiment  O.ne.  Each  subject  was  presented  first  with  three  sequences  from 
the  original  study,  numbers  12,  28,  and  38  (see  Fig.  3).  Sequence  28  is  relatively  easy  (the 
data  clearly  indicate  only  one  hypcxhesis  as  the  “correct"  one),  12  is  moderately  difficuU  (i.he 
data  point  ambiguously  to  two  hypotheses),  and  38  is  difficult  (the  data  are  ambiguous  about 
all  four  hypotheses).  The  order  in  which  these  sequences  were  presented  was  completeh 
couraerbalanced  for  the  six  subjects.  The  prior  probabilities  were  displayed  above  each  of  the 
conditional  probability  displays  and  remai.ned  in  view  throug.hout  the  e.ntire  seauence  of  fifteen 
dots.  Response  levers  had  to  be  reset  after  each  slide, 

FoUowi.ng  t.hese  sequences,  th.e  subjects  were  presented  with  the  f'r^t  liiree  sequences,  the 
only  differences  being  that  the  data  and  the  conditional  probability  displays  were  inverted  and 
reversed  and  dots  did  not  accumulate.  These  sequences  were  designated  62,  78,  and  68  (add 
50  to  the  original  sequence  .number)  and  were  presented  to  each  subject  in  the  same  order  in 
which  the  first  three  were  given.  The  prior  protiabiluies  were  displayed  on  a  slide  just  prior 
to  the  first  d<^.  The  subjects  were  required  to  set  their  levers  according  to  tjie  prior  prob¬ 
abilities  displayed  on  the  first  slide,  and  were  told  to  revise  that  estimate  when  shown  the  first 
da.  They  were  nm  allowed  to  reset  their  levers  to  zero,  aixi  were  instructed  to  revise  their 
lever  settings  as  they  received  new  information. 

Normalization  of  posterior  estimates  was  required  under  bah  presentation  conditions. 

The  cover  story  attempted  to  attach  equal  utilities  to  the  four  hypotheses, 

Subject-S  were  j^ked  at  the  completion  of  all  seque.nces  if  they  noticed  any  similarities  be¬ 
tween  the  first  three  sequences  and  t.he  latter  three.  No  subject  reported  that  he  did. 

2.3.3.  RESULTS.  On  Sequences  28  and  78  ail  subjects  te.nded  to  underestimate  the  high 
prQbahiiitie.s  and  overestimate  the  probabilities  for  the  other  three  hypotheses.  This  tendency 
is  evident  to  a  lesser  degree  in  Sequences  12  and  ^2,  but  not  very  apparent  in  Sequences  38  a.nd 
88:  this  is  profaahiv  because  the  Bayesian  posterior  probabilities  are  not  as  e.xtreme  for  t.hese 
sequences. 

Performance  Indices  were  computed  for  each  subject  on  each  sequence  for  each  value  of 
fi.  A.h  analysis  of  variance  on  t.hese  Pi’s  gave  the  reaiits  shown  in  T^le  IV.  Because  only  one 
observation  appeared  in  each  cell,  the  error  term  used  in  t.he  arAivsis  was  t.he  figure  represent! 
the  mean  squares  of  t.he  sequences  times  prese.nlation  times  dots  times  subjects  variable. 

In  interpieting  this  a.haiy8is  of  variance,  it  is  important  to  keep  in  mind  that  PI  is  beirig 
exami.hed,  so  t.he  experlmentaJ  variables  must  be  understood  to  affect  the  degree  to  which  sub¬ 
jects  were  successful  in  approachi-ng  Bayesian  performance  Three  of  the  ma:n  effects  are 
significant.  Sequences,  for  o.ne,  subierts  are  ies.s  Bayesian  for  the  more  difficult  sequences. 


The  number  of  d«s  is  mildly  significant,  the  trend  being  towards  the  Jess  Bayesian  performance 
as  the  number  increases,  though  there  are  some  exceptions  for  some  subjects  at  some  sequences. 
For  Sequences  26  and  78  all  subjects  became  less  Bayesian  with  more  dms;  for  the  <^her  two 
sequences  the  PI  peaks  sharply  and  rather  irregularly,  bin  there  is  enough  consistency  among 
subjects  to  give  a  sequences  x  dots  interaction. 


TABLE  IV.  SUM.VlARy  OF  ANALYSIS  OF  VARIANCE 


Source 

df 

MS 

F 

Sequences 

2 

104,525.5 

43.79*  ♦ 

Presentation 
(sequential  or 
nonsequential) 

1 

5,314.0 

2.23t 

Number  of  dots 

14 

5,111.1 

2.14* 

Subjects 

5 

17,870.4 

7.49** 

Sequences 

X  presentation 

2 

2,554.0 

1.07t 

Sequences 

X  dots 

28 

9,i2§.l 

4.08*  ♦ 

Sequences 

X  subjects 

10 

13,830.1 

5.80*  • 

Presentation 

X  ctots 

14 

1.457.3 

-t 

Presentation 

X  subjects 

5 

8,913.2 

3.73** 

Dots  X  subjects 

?0 

4,167.5 

1.75** 

Sequences  x  pre¬ 
sentation  X  dots 

28 

2,017.3 

-t 

Sequences  x  presen¬ 
tation  X  subjects 

10 

9,048.8 

3.79** 

Sequences  x  dots 

X  subjects 

140 

3,334.3 

1.40* 

Presentation  x  dots 

X  subjects 

70 

2,107.1 

-t 

Sequences  x  presenta¬ 
tion  X  <k>ts  X  3s 

140 

2,386  8 

**P  <  .01 
•  P  <  ,05 
t  n.s. 

Individual  differences  among  subjects  are  high;  thus,  t.he  subjects  main  effect  is  significam. 
For  some  subjects  the  methtKi  of  preseitation  makes  a  difference.  This  is  true  for  all 
subjects,  however,  so  there  is  a  presentation  x  subjects  iraeractioo,  teii  not  a  main  effect  due 
to  present^ion.  further,  for  those  to  whom  presentation  condition  does  make  a  difference, 
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the  direction  of  this  difference  varies  according  to  sequence.  The  effect  of  subject  s  is  apparent - 
iy  strong,  for  there  Is  .no  significant  sequences  x  presentation  interaction  (although  lack  of  t.his 
^Meraction  may  be  due  to  the  low  number  of  degrees  of  freedom). 

St^qaences  x  .subjects  is  significant;  while  some  subjects  are  most  nearly  Bayesian  on  the 
easiest  sequence  and  least  Bayesian  on  the  hardest,  some  are  ncd.  The  sigraficance  of  the  triple 
imeractjon,  sequences  x  subjects  x  dots,  undoutaedly  results  only  from  the  great  number  of 
degress  of  freedom. 

Some  subjects  tend  to  be  less  Bayesian  at  the  start  of  a  sequence  and  more  Bayesian  near 
the  end,  whiie  others  reverse  this  trend.  This  leads  to  the  dots  x  subjects  i.nteraction. 

To  summarize,  the  only  highly  significam  mair.  effect  is  that  of  sequences.  This  means 
that  highly  conflicting,  ambiguous  information  leads  to  performance  which  is  less  .Bayesian 
than  that  produced  by  unambiguous  information.  Other  factors  also  influence  performance,  but 
are  less  important. 

2.3.4.  DISCUSSION.  This  experiment  shows  that  more  ambiguous  tnformatio.n  produces 
less  Bayesian  performance,  ft  seems  likely  that  ambiguity  irfteracts  with  the  number  of  hy¬ 
potheses  considered  by  the  subject.  Oi  course  the  subject,  being  conservative,  may  be  con¬ 
sidering  as  plausible  hypotheses  that  have  negligible  Bayesian  posterior  probability;  u  may  be 
possible  to  Improve  performance  in  multihypothesis  situations  by  reducing  the  number  of  hy- 
piMheses  u.nder  active  consideration  as  rapidly  as  the  data  permit. 

The  other  major  ftndii^  of  the  expertment  is  that  sequential  vs.  nonsequential  presenta¬ 
tion  of  data  makes  very  littie  difference.  This  fimiir^  is  not  too  surprising.  Expenment  Oiw 
showed  that  subjects  were  treating  each  slide  as  a  separate  problem,  whether  or  not  it  appeared 
in  ordered  sequence,  in  this  experiment,  .no  subject  performed  belter  uisder  sequential  than 
ufsier  rtonsequeraiai  conditions  of  presentation;  some  performed  worse.  Apparently  t.he  differ¬ 
ence  in  the  kind  of  information  processing  required  makes  little  difference  to  performance.  W 
c.ourse  all  information  necessary  to  calculate  valid  posterior  probabilities  is  prese.nt  umier 
brth  corjditions.  If,  in  t.he  seque.ntlal  (only  one  dot  on  the  screen  at  a  time)  mode  of  presentation 
the  subject  nad  been  lequired  to  reset  his  esiimaiior,  levers  to  zero,  thus  putting  a  load  on  his 
memory,  presumably  performance  would  have  deteriorated. 

Methodological  issues  cloud  the  picture.  All  .subjects  were  first  presented  with  the  t-hree 
sequences  irs  which  displayed  dots  accumulate,  arid  then  with  those  wherein  t.he  dots  appear 
sequentialiy.  This  order  may  have  caused  safajects  to  try  to  perform  t.he  sequential  task  in 
the  same  manner  as  the  nonsequerttial  even  itiough.  their  trstructions  for  the  former  were  to 
revise  their  last  lever  setting  as  they  gamed  new  information.  Since  subjects  were  not  told 
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that  the  revision  was  to  be  based  only  on  the  new  d-x,  they  couid  have  based  their  revision  on 
the  new  dot  on  w=hat  they  remembered  of  the  previous  dots.  This  sn^em,s  the  possibiiity 
that  memory  factors  are  affecting  performance  of  the  sequertiai  task,  and  that  U  is  for  this 
reason  that  performance  there  doesn’t  consistently  differ  from  performance  on  the  re-nsequerAial 
problem. 


2,4.  OVERALL  DISCUSSION 

Experiments  Twt>  and  Three  suggest  that  conservatism  is  a  very  pervasive  phenomenon, 
little  affected  by  duferent  stimulus  displays  or  different  response  modes.  This  conservatism  In 
processing  i.nformation  conforms  to  our  intuition  and  to  our  observatio.os.  We  believe  that  men 
typically  want  to  be  more  certain  than  they  should  waid  to  be,  and  seek  too  much  tnform^lon; 
that  general izatior.  combined  with  the  rules  of  the  game  Is  often  ersough  to  play  wmnii^  poker. 
Furthermore,  irkuHion  suggests  an  interaction  with  payoff:  t.he  larger  the  payoff,  the  larger 
the  excess  of  information  that  a  decision-maker  seeks  over  what  he  stould  seek.  Anecdmal 
observations  that  people  seek  too  much  information  have  crfien  been  lutrlbut&l  to  a  "desire  for 
certainty,''  or  to  a  "disiike  of  intermediate  probabilities,"  or  to  a  "fear  oi  failure  in  excess 
of  desire  for  success,"  or  to  some  similar  motivational  construct.  These  ftraiings  suggest  a 
different  likerpretaiion;  people  seek  too  much  Information  rsk  because  tl»y  walk  too  much 
certaimy,  but  rather  because  they  cannot  extract  from  the  information  ttey  have  as  much 
certainty  as  u  in  principle  justifies.  In  other  words,  the  suboptimai  behavior  may  be  the  re¬ 
sult  of  inteilectuai,  not  motivational,  deficiencies. 

Two  specuiiUions  about  the  reason  for  the  intellectual  deficiencies  that  lead  to  conserva¬ 
tism  in  irsformation  processing  occur  to  us.  First,  the  real  world  is  always  changirig;  certain 
kinds  of  hv’potheses  that  seem  true  today  may  not  be  true  tommorrow.  Thus,  evidence  about  the 
truth  of  one  .hypckhesis  in  the  real  world  may  be  misleading,  imt  because  the  hypckhesls  was 
rtOt  true  at  the  time  the  evidence  was  collected,  tmt  rather  because  the  world  h«4i  changed  since 
then.  One  possible  defense  against  being  misled  Is  to  resist  persuasion,  to  require  large 
amounts  of  evidence  before  acting,  ft  Is  not  difficult  to  imagbie  a  learning  process  for  acquiring 
that  defense;  e-xperiences  should  na  be  hard  lo  come  by  in  which  actii^  in  accord  with  the  wei^t 
of  the  evidence  and  being  wro.ng  leads  to  punishmem, 

A  second  similarly  specul^ive  expianaiion  of  the  conservatism  concerns  the  dependence 
of  data.  If  two  data  are  itxiependerk  given  a  hyprthesis,  t.hen 

PCD  ;H)  =  PCD  ’H,  D,  ?  and  P(D,,iHi  =  PfD.  jH.  D.) 

J  1  k  k  k  j 

for  that  hypot.hesis  under  consideration.  C  Note  that  the  reiiUlon  q|  hodepeiuleiice  is  a  relationship 
a.mong  at  least  two  data  a.faia  hypcHhesiS,  sothai  data  may  be  indepeiweftt  given  one  hypothesis  and 
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dependent  given  another.)  tn  the  real  vtx»rld,  data  are  often  not  instependent  of  oi®  another .  More¬ 
over,  one  kind  of  depemtence  is  far  more  frequent  than  any  other:  repeated  observations  of  the 
same  ctatum.  Thus  when  you  look  around  your  office  ami  see  John  there,  and  a  ntoment  later  look 
again  and  again  see  John  there,  you  A>  not  conclude  that  there  are  t«o  people  in  your  office;  in¬ 
stead  you  conclude  that  John  lias  remained  there.  Men  may  be  aecustoawd  to  discount  the  signif¬ 
icance  of  items  of  evidr  nee  that  resemble  oitt  aiiotlwr.  If  so,  one  might  expect  that  qualitatively 
different  items  of  evidence  would  have  more  impact  on  opinion  than  qualitatively  similar  Items, 

In  any  case,  our  findings  strongly  suggest  that  men  should  rroi  be  required  to  estimate 
posterior  probabilities  in  irJormation-processing  systems.  If  the  conservatism  in  information 
processing  suggested  by  this  experiment  is  also  reflected  in  decision-making,  questions  are 
raised  about  the  quality  of  men's  decisions  in  such  cases. 


34 


3 

THE  EFFECT  OF  A  FLATl’ENED  CONDITIONAL  PROBABILITY  DESTRIBUTiON 

ON  PROBABILITY  INTIMATION* 

In  experimental  situations  where  subjects  are  given  a  set  of  hypotheses,  prior  protebUlties 
for  the  hypotheses,  and  conditional  probability  distributions  for  information  or  data  given  the 
respective  hypotiKJSes,  the  usual  finding  has  been  conservatism;  subjects  change  their  protJabii- 
ity  estimates  less  than  the  amount  prescribed  fay  Bayes’s  ttworem. 

A  series  of  studies  conducted  by  Harold  C.  A.  Dale  (1962,  unpobiished)  approached  the 
tion  of  probability  estimation  as  a  trainif^  problem  Ln  probatsiistic  diagnosis.  In  the  Dale  stui- 
ies,  the  subject  is  placed  in  a  simulated  war  game.  He  is  told  that  enemy  forces  may  launch  any 
one  of  four  types  of  attack  and  his  task  is  to  estimate  the  profcability  of  each  as  he  is  presented 
with  a  wquence  of  information  concerning  enemy  activity.  For  each  datum,  four  different  values 
of  P(Di  H)  are  possible,  one  for  each  hypotiwsis;  these  values  are  displayed  to  the  subject.  Thus, 
this  task  is  very  similar  to  the  one  reported  in  Section  2.  Here,  too,  the  normattve  solution  is 
given  by  Bayes’s  theorem. 

Again,  subjects  were  found  to  estimate  conservatively.  Several  possible  explanations  for 
th^ir  conservatism  were  considered  and  examined  by  Dale.  If  «ibjects,  rather  than  accepting 
the  displayed  condiuortal  probability,  operated  with  a  cQnditior>ai  probability  matrix  that  was 
flatter  (having  less  variance  than  the  objective  display)  then  the  outcoo^  would  be  the  observed 
conservaiusm.  If,  on  the  other  hand,  subjects  did  not  emptoy  th»  Bayesian  multIpUcaticm  rtiie 
but  rather  used  some  sort  of  addition  of  probabilUies,  v“on#ervatism  would  still  prevaiL  A  third 
possibility  is  that  subjects,  while  accepting  the  multiplication  rule,  make  consistent  conqxtfaiiQna! 
errors. 

Studies  of  these  possibilities  indicate  that  subjects  persist  in  conservatism  even  when  allowed 
to  set  their  own  conditloriai  proi^bUity  dlstrllmtions  and  prior  probabilities.  It  abKi  seems  that 
to  provsje  subjects  With  a  demonstration  of  the  roultipUca’  on  rule  and  traisii^  in  it*  use  ctoes 
not  improve  the  accuracy  of  estimation  unless  subjects  are  allowed  to  actually  carry  out  |^43er 
and  pencil  computation. 

The  i>ersistence  of  conservatism  led  to  conjecture  as  to  whether  there  could  be  eonStr^ted 
a  conditioriaJ  probability  matrix  that  would  not  result  in  conservatism;  this  question  gave  rise  to 
the  experiment  reported  here. 

‘This  section  was  prepared  by  Melvin  Guyer  and  Ward  Bdwart^. 
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:.l  IFTROXIUCHON. 

In  a  simul^ed  war  game,  subjects  were  instructed  to  esUmats  the  prohaDilitles  of  each 
of  tour  mutually  exclusive  hypotheses  when  provided  with  data  and  conditional  probability 
displays  for  the  data  given  the  hypotheses,  and  an  altfeoipt  was  m:fe  to  construct  a  set  of 
eorrfitionai  probability  distributions  t.  at  would  lead  subjects  to  revise  their  probability 
estimates  more  than  the  amount  prescri-aed  by  Bayes’s  theorem.  Achieving  these  results 
would  suggest  certain  explanations  of  the  consert'atism  found  fairly  cons:vteiiffiv  in  'mila” 
situations  reported  In  the  iiteratare. 


3.2.  METHOD 

Each  oi  20  University  Michigan  faale  uRctergraAiate  students  was  randomly  asstgied  to 
one  of  four  experimental  groups  m  a  2  *  2-^sigr  ej^riment.  Two  sets  of  conditional  prcbabii- 
'ty  matrices  were  devis<»d.  One,  hereafter  referred  to  as  the  “baste"  matrix,  had  the  form 


H 

H 

D 

0,40 

0.10 

0.20 

0.20 

0.10 

0.10 

0.40 

O.iO 

0.10 

Q.30 

0.20 

0.10 

0.10 

0.40 

0.20 

0.30 

0.30 

0.10 

0.10 

0.20 

where  H,  through  were  a  set  of  mutually  exclusive  hypotiieses  concerning  the  form  of  a  pos- 
slbi°  enemy  attack,  and  e-  through  e_  were  a  set  of  possible  nwssages  that  the  pibject  might 
receive  and  whose  impact  on  the  prolH>bUities  of  the  hypotheses  he  would  tuve  to  estimate.  A 


seco!..‘  matrix,  called  the  degraited'  ijatrtx,  was  constructed  by  adJlsig  a  constant  of  2.00  to 


each  value  in  the  bas!*  matrix  and  then  normalizing,  llie  degraded  maf'-ix  lad  tl»  following  form 


^1 

«2 

e.. 

3 

0.22 

0.19 

0.20 

0.20 

0.19 

0.19 

0.22 

0.19 

0.19 

0.21 

0.20 

O.IP 

0.19 

0.12 

0.20 

0.21 

0.21 

0.19 

0.19 

0,20 

The  labeling  for  tb?  hypotheses  s  nd  the  m-sssages  was  of  course  the  same  for  both  baste  and 
degi  .ded  malrices- 

1.1*  f  atriCes  were  display  i  to  the  subjects  as  sets  of  bar  graphs,  one  for  each  hypothesis. 

.  c.  lark-  sheets  of  white  (ardboard.  Each  graph  was  lah  led  so  that  t.^e  probaDiiity  values 
could  m  read  ^asiJy, 

AdGnional  aj;naratus  ..nciuded  a  "map'  of  a  supposed  enemy  terrain  with  various  strate- 
sir  aiea-  dema"Ca*.d,  and  showing  the  location  of  an  agent  w.ho  would  be  the  source  mcssageb 
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concerniiig  enemy  activity.  Tte  gub|#ct  »a«  also  given  a  chip  board  and  100  meiai  ehl^.  The 
chip  board  was  ma^  up  of  four  columns,  each  with  ten  trot^hs  cafmble  of  holding  ten  chips. 

Each  eolumn  of  tte  board  was  labeled  for  one  of  the  hypothecs  and  ti»  number  of  chir^  placed 
in  the  columns  by  the  subject  indicated  tite  subject's  estimate  of  tte  pro^biilties. 

Each  of  tlw  sut,A'  vag  run  on  i»th  matrices,  the  assignment  to  or^r  of  matrices  being 
random,  and  matrix  order  bet!^  an  experimental  treatment.  Since  e^h  subject  was  to  be  run 
on  both,  it  was  necessary  co  construct  two  different  sequences  of  ten  messages  each.  Hie  se¬ 
quences  not  only  had  different  orders  of  messages  but  also  provisted  evidence  for  different  hy- 
iwtheses.  The  prolability  valves  of  the  respective  hypotheses  for  each  sequence  were  quite  simi- 
iar  and  the  values  at  tim  end  points  of  the  sequences  were  almost  identical.  Tte  assignment  to 
sequence  order  was  ran<toin,  and  was  also  an  experimental  tre^ment. 

Each  subject  was  seated  before  the  map  of  enemy  terrain  with  tte  conditional  protebility 
matrix  displayed  and  tte  chip  board  close  at  hand.  Initially  tte  chips  were  distributed  equally 
among  the  four  columns  and  the  subject  was  told  that  tte  present  rtate  of  our  knowle^^  concern¬ 
ing  enemy  activity  Juirtifled  this  distribution  of  chips.  Tte  subject  was  instructed  in  tte  use  of 
the  chip  board  and  was  told  tte  nature  of  tte  task.  He  was  requested  to  mate  estimates  of  tte 
probability  of  each  hypsttesls  as  mess^es  from  the  agent  came  in  (tte  messages  were  presented 
to  the  subject  by  tte  experimenter).  Tte  subject  made  bis  e^lmates  ami  then  redlstrilaged  tte 
chips  amor,g  tte  columns.  Tte  experimenter  recorded  tte  dtstritetion  of  probabilities  for  tte 
hypotheses  aft?”  each  uKSsage.  After  a  sibject  h,  d  been  run  on  tte  first  sequence  of  messages 
he  was  given  aoditionai  instructions  to  explain  tte  introduction  of  tte  «co!Kt  matrix  and  was  iten 
run  on  the  remaining  sequence  of  messages. 

3.3.  RESULU 

Figure  11  shows  tte  averaged  subjective  estimates,  usit^  tte  imstc  matrix,  of  the  prc^- 
abiitty  of  the  hypothesis  confirmed  by  tte  data.  The  ^per  curve  represents  tte  Bayesian 
vaiues  of  tte  posteiior  proimbillties  after  each  message  is  received.  The  middle  curve  repre- 
se.nts  tte  averaged  scores  for  tte  group  first  run  on  tte  tesie  matrix;  and  tte  lower  curve,  the 
averaged  scores  for  subjects  run  first  on  the  tegraded  matrix  and  then  on  tte  tesic  Of  eourse 
tte  sequence  is  tfte  same  for  ail  curves  in  Fig.  11. 

Figure  12  gives  tte  same  information  as  Fig.  ll.exce^  that  Sequence  2  was  used  ratter  than 
Sequence  1. 

Figure  13  shows  tte  averaged  subjective  estimates,  based  on  tte  tegraded  matrix,  of  tte 
probabiii'y  of  tte  hypothesis  confirmed  by  the  data.  Tte  solid  curve  is  tte  ol^ective  estimate, 
tte  upper  curve  is  for  estimates  mate  wten  tte  tesic  matrix  preceded  tte  degrteted,  and  the 
dotted  curve  is  for  esi..mateg  mate  when  tte  degraded  matrix  came  firm,  AU  curves  in  Fig.  13 
are '  on  the  same  sequence  of  messages  to  tte  sulgeet. 
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Fi^re  14  j^vidss  tl»  same  information  as  Fig.  13  except  that  it  is  baiied  on  Sequence  2 
rather  than  Sequence  1= 

Figtire  15  compares  tte  ot^ective  prolabiiity  estimates  using  the  basic  matrix  for  Sequences 
1  and  2. 

Figure  18  compares  the  olqective  probability  estimates  using  the  degraded  matrix  for  Se¬ 
quences  1  and  2,  Ttese  last  two  figures  make  it  possible  to  directly  compare  the  rate  of  change 
of  protsbilities  for  tte  two  sequer?ces.  8  should  be  remembered  that  the  sequences  increase 
the  prol^bllities  for  different  hypotlmses  and  are  drawn  with  respect  to  the  probable  validity  of 
the  hypotheses  which  tl»y  respectively  confirm. 

An  anaii^iis  of  variance  was  ctone  on  the  subjects'  final  estimates  of  the  probability  of  the 
hypothesis  that  tended  to  be  confirmed  by  the  particular  data  sequence  used.  A  separate  analy¬ 
sis  was  dor*  for  scores  on  the  basic  matrix  and  for  scores  on  the  degraited  matrix;  that  is,  ttey 
were  treated  as  ^^rate  scores  and  the  order  of  matrix  presentatloi  was  taken  as  an  experi¬ 
mental  treatment.  The  results  of  the  analyses  of  variance  are  summed  up  in  Tables  V  and  VL 


TABLE  V.  SUMMARY  OP  ANALYSE  OF  VARIANCE 
OF  FINAL  ESTIMATES  OF  PROBABILITY 
USING  BASIC  MATRIX 


Source  of  Variation 

df 

MS 

F 

P 

Columns  (data  sequence) 

! 

520.2 

2.13 

Rows  (matrix  order) 

1 

1,065.8 

4.37 

A 

o 

(cells) 

3 

836.9 

Rows  X  columns 

1 

924,8 

3.79 

A 

O 

Within  cells 

16 

243.65 

Total 

19 

TABLE  VI.  SUMMARY  OF  ANALYSIS  CF  VARIANCE 
OF  FINAL  ESTIMATES  OF  PROBABILITY 
USI?^  DEGRADED  MATRIX 


Source  of  Variation 

d! 

MS 

F 

P 

Columns  (data  sequence) 

I 

68, 4d 

1.09 

Rows  (matrix  order) 

1 

858,05 

13.67 

P  <  .005 

(celisi 

3 

446.85 

Rowg  X  coium.ns 

1 

414.05 

Within  cells 

16 

62  75 

8.59 

P  <  025 

Total  i 9 
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FIGURE  H.  A VERAGEEj  SUBJECTIVE  ESTI¬ 
MATES  OF  THE  PROBABILITY  OF  THE  HY¬ 
POTHESIS  CONFIRMED  BY  DATA,  USING 
THE  BASIC  MATRIX!  SEQUENCE  I 


trial  Kl SiBFB 


FIGURE  12.  AVERAGED  SUBJECTIVE  ESTI¬ 
MATES  OF  THE  PROBABILITY  OF  THE  HY- 
PC>THESIS  CONFIRMED  BY  DATA,  USING 
THE  BASIC  MATRIX-  SEQUENCE  2 


TRIAL  KIMBER 

FIoURE  il  .AVER.AGED  SUBJECTIVE  ESTI¬ 
MATES  OF  THE  PROBABILITY  OF  THE  HY¬ 
POTHESIS  CONrraMED  BY  DATA,  USING 
1  ME  DEORADED  MATRIX:  SEQUENCE  I 


TffiAL  SiLISES 


FIGURE  14.  averaged  SVBJECTWE  ESTI- 
MATES  or  THE  PRoaABILITY  OF  THE  HY¬ 
POTHESIS  CONFIRMED  BY  DATA,  USING 
the  DEC-aADED  MATRIX:  SEQf  ENCE  2 


TRLA:  UBEB 


FRjt’.RE  I'i  OBJECTIVE  PCSTLRIOR  PROB- 
ABIl.rTY  E-STIM-ATES,  i'SlNG  THE  BASIC 
MATRIX 


figure  16,  oaiECTIVE  PC«TERIOR  PROB 

ability  estimates,  using  the  DEGRADED 
MATRDC 


. . . . 


anitlfs«s  of  varianc#  on  scores  oMained  froia  tte  tesic  matrix  and  scores  obtained 
from  degraded  matrix  were  Tlte  ^parate  analyses  preserve  the  effect  of  matrix- 

prewntation  ortfer  as  an  experiisentai  treatment,  and  thus  do  not  ignore  an  important  independ¬ 
ent  variaUe. 

As  the  prli^ry  question  raised  In  the  experiment  was  the  possibility  of  de%'istng  a  eosidi- 
ttonal  protahiilty  matrix  that  would  result  in  subjects*  ehang ing  tteir  probability  estimates  too 
aaieh,  tlm  final  estimation  scores  for  the  sequences  run  on  the  degrafed  matrix  were  examined 
way  of  a  t-test.  Here  iM  hypothesis  tested  was  that  tte  difference  between  the  means  of  U» 
final  estimates  and  the  objective  value  at  that  point  differed  significantly  from  zero;  since  tte 
alternative  hypothesis  was  that  of  overestimation,  a  one-tailed  test  was  appropriate.  The  results 
of  the  t-tests  are  mimmarlzed  in  Table  Vfl. 


TABLE  Vn.  BESTJLTS  OF  t-TESTS  FOR  THE  SIGNIFICANCE 
OF  THE  WFPEFUNCE  BETWEEN  MEAN  SUBJECT  ESTIliATTIS 
AND  OBJECTIVE  ESHMATES 


Sequence  B-D 
D-B 


t  Value  P 


3.51  P  <  .OOS 

-40  P  <  ,25 


Tte  table  clearly  imiicates  that  when  the  degra^d  matrix  is  used,  scores  show  a  significant 
overestimate  for  the  relevant  hypothesis  (where  overestimation  is  taten  to  be  an  estimate  greater 
than  tte  normative  Bayesian  protabllltyrj,  when  de^aded  matrix  is  presented  after  tl»  tmsic 
matrix. 

3.4.  EaSCU^ION 

Figures  11  and  12  provt<fe  smre  evidence  of  uncterestiination  of  objective  probabilities 
chie  to  subjects*  estimates  chmtgli^  less  than  is  called  for  by  Bayes's  theorem.  Tte 
ammmt  of  ui^restimation  seems  to  be  directly  related  to  the  oriter  of  matrix  presentation. 
Both  Figs,  11  and  12  show  that  the  d^ree  of  underestimaiton  obtained  on  the  basic  matrix  when 
it  was  preceded  by  the  degruied  matrix  is  of  a  greater  magnlmde  than  th^  produced  by  the 
oj^site  preseigatton  order.  The  effect  of  matrix  order  on  the  level  ^  estimation  is  more 
dram^ically  displayed  by  Figs,  13  and  14,  which  presert  the  scores  for  tne  degraded  matrix  con¬ 
dition.  Wten  the  degraded  matrix  was  followed  by  the  »sic,  tte  subjects  overestimated  the 
protebliitfrs.  witen  tte  degr^Jed  matrix  was  presented  first,  subjects  again  ter-ded  to  ur.der- 
estimate  the  protebllities.  The  underestimate  obtained  in  this  rondition  attests  to  the  persistence 


of  the  pftenomenon;  the  objective  protabiiity  was  0,32,  subjects  managed  to  undertstimaie 
the  relevant  hypothesis  and  yet  favor  it  over  the  otters.  The  estimates  were  between  0.25  and 
0,32  for  seven  out  of  the  ten  subjects  run  on  this  condition. 

Since  overestimatlon  on  the  degraded  matrix  was  only  otgained  wten  the  degraded  matrix 
was  preceded  by  the  basic  it  seems  that  tte  larger  magnibite  of  estimations  on  the  tesie  matrix 
irtrodaces  a  response  set  that  carries  over  irgo  the  dep-^ed  condition.  This  response  set  also 
seems  to  carry  over  from  the  degraded  to  the  basic  comlttlen,  as  is  iisticated  by  the  greater 
ctegree  of  underestimation  on  tte  basic  matrix  wten  it  is  preceded  tte  tegraded. 

The  results  of  this  study  suggest  that  eonservatism  is  found  ordy  when  higfe»yarlar*ce  coraii- 
tionai  protebiiity  displays  are  used.  Data  that  has  relatively  mw  dli^iiostie  value  lead#  subjects 
to  make  probability  estimates  ttet  are  very  nearly  Bayesian.  lAtter  tte^  conations,  subjects* 
faculties  for  estimating  probabilities  are  not  as  ted  as  ttey  would  at  first  seem,  K  may  well 
be  that  even  for  bigh-varlance  conditional  probabilities  sut^ects  esttfimte  probaMiities  mwh 
belter  than  their  responses  indicate.  This  possildiity  gains  wet^  from  difficulty  one  has 
conceiving  a  situation  in  which  a  person  behaves  as  a  fairs  estimator  of  probability,  without 
fakii^  other  declsion-maklc^  parameters  into  account.  Conservatism  may  be  accounted  for  in 
terms  of  the  utilities  introduced  into  tte  task  of  estimating  protebillty;  white  tte  sttuatfon  in  this 
study  was  only  a  simulated  war  game,  subjects  did  tend  to  becon*  ei^rosaed  in  tte  task.  Tteir 
concern  with  the  consequences  of  their  protebillty  estimates  ccnild,  ai^  tmdDuteediy  dfo,  infl^nce 
thQ»  estimates  to  a  degree.  In  furtter  pursuing  this  line  of  tten^te,  it  would  seem  that  experi¬ 
mental  mani^ation  of  tte  utilities  inherent  in  an  estimation  taj^  would  answer  some  of  tte 
questions  concerning  the  ability  of  humans  to  beteve  as  "fsire"  proteKiity  esHmators. 
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4 

THE  ESHMAHON  Cf  CREDIBLE  INTERVALS* 

A  crtUcissB  that  may  be  leveied  at  many  of  ihe  probabilistic  informanon-processing  ex¬ 
periments  is  that  tijey  deal  with  discrete  hypotheses  rather  than  continuous  parameters.  For 
example,  in  t!^  expei>  -  '“ts  aimady  descnbed,  subjects  irere  asted  to  give  the  posterior  pmb- 
afaiiities  of  disci^te  Ai-.U^ses:  they  srere  asked  to  make  ^tnt  estimates.  In  tte  present  ex¬ 
periment  subjects  iwr>  sfeed  to  estimate  a  contiiaious  parameter,  to  give  tte  90%  or  50%  credi- 
bte  taterval  oi  a  ^  in  r.r  dlstritsaltoii,  aibiects  were  presented  with  a  sequence  of  numters  drawn 
from  a  imrmal  distinb.  ^  on  with  fcmwn  variance  but  unknown  omans,  ami  after  each  presentation 
of  a  mimber  were  rfe’u.rsd  to  estimate  either  a  90%  or  50%  credible  interval  for  that  n^an. 

It  seemed  that  th'’  >  jnservaiism  found  for  discrete  hyistheses  mi^it  reasonably  be  expected 
in  ttie  estimation  of  cc.  'inuous  parameters  also.  Therefore,  it  was  anti  elated  that  the  credible 
intervals  given  by  sub  (  .is  wouki  not  decrease  in  stie  with  the  square  root  of  the  number  of 
fdjservnaons,  as  tlwy  s*’  >ald,  but  wouki  decrease  more  slowly. 


4.1.  METHOD 

4.1.1.  SUBJECTS.  Fl'^  male  summer  school  students  at  The  University  of  Michigan 
volunteered  Uj  partieipate  in  the  experiment.  They  were  paid  $1.25  per  hour. 

4.1.2.  INSTRUCTIONS  TO  SUBJECTS.  Sibjects  were  asked  to  make  guesses  about  the 
average  or  mean  of  a  set  of  riormally  dlstritHited  numters.  They  were  toid  mat  they  would  see 
a  s^uence  of  numbers  randomly  choeen  from  that  set  and  that  the  experimenter  was  tnierested 
in  the  degree  of  certainty  each  new  number  gave  diem  about  the  average  or  mean  -of  the  set  from 
whicR  the  8«|aence  of  numbe”'’  'as  drawn. 

The  subjects  were  asked  to  show  their  certainly  by  giving  ci  edible  intervals  within  which  they 
were  either  50%  or  93%  sure  that  the  mean  stould  fail  They  were  told  that  as  they  saw  more 
and  more  numbers  they  should  become  increasingly  certain  about  the  mean,  and  thus  should 
he  able  to  make  their  cralible  intervals  amaiier  and  smaller. 

The  subjects  received  InsU'ucttwi  abwt  the  parnmeters  of  a  normal  distribution  and  Its 
Symmetry.  Beiore  seeing  any  numbefS  they  were  told  the  standard  deviation  of  the  powlalion  from 
which  the  numbers  were  drawn  airf  the  experimenter  set  an  a  priori  credij'le  interval  wiihin  which, 
witiKJut  seeing  any  numbers,  they  could  be  50%  or  9Vf-  certain  the  po^lation  mean  would  fall. 

*Thi8  secuon  was  prepared  oy  Marilyn  T  Zivlan  airf  Ward  Edwards,  on  the  l»sis  of  data 
colleciM  by  Samuel  M.  Rubin. 
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TTiey  were  Informed  lhat  there  was  a  perfect  j^rformance  to  which  their  performance  wouW  be 
TOBipared  and  that  he  who  performed  best  wou^  receive  an  extra  ^yment  for  participating  in 
the  experisi^t. 


4.1.3,  SEQUENCES.  Three  sequences  of  64  numbers  each,  which  we  will  call  '  origi.nai 
s«iuence8/’  were  generate  by  selecting  numbers  at  random  from  a  labie  of  ra.r^om  numbers. 
The  numters  came  from  a  noraaiiy  distributed  population  with  mean  zero  am-  standard  devi¬ 
ation  one.  From  the  origlnai  sequences  nineteen  secoirfary  sequences  were  generated  by  multi- 
piying  each  number  in  a  sequence  by  one  of  two  standard  deviations  (S  or  10)  ami  adding  to  each 
.number  one  of  four  mean  values  (0,  4,  SO,  or  54).  Tlie  nineteen  secoitfarv  sequences  were 
labeled  with  letters  of  the  aii^abet  from  A  to  S  and  were  shown  in  a  different  rai^im  order  to 
each  suSyect.  For  sequences  A  to  P,  subjects  were  asic^  to  estistate  i0%  credible  intervals, 
for  Q  to  S,  §W,  credible  intervals. 


4.1.4.  DISPLAY  OF  SEQUENCES.  Ti«  sequences  were  displayed  to  the  objects  long 
rolls  of  a^ing  machine  tape  which  pass^  a  wimtow  in  a  screes  about  Siree  feet  in  front  of  t^ 
subject.  A  subject  was  shown  a  number,  he  miaie  Ms  estimate,  aiM  Bte  amt  sum^r  was 
rolled  into  view  Once  he  saw  a  number,  it  stayed  in  view  until  all  64  miabers  were  visible  in 
the  wifMow. 

4.1  5,  PRIOR  SETTINGS.  When  satqects  were  asi^  to  give  SO%  credible  intervals,  toe 
a  priori  inter val  set  to'  the  experimenter  was  toat  intervai  about  the  mean  equal  to  y  »  (1  §45) 
Cs.d.);  for  the  50%  credible  interval  esbmation,  the  a  pdorl  iiiterval  seeing  was  at  M  *  (0.674) 

(E.d,). 


4.1.6-  RESPONSE  APPARATUS,  The  respmse  ^isisied  at  a  wooden  stand  upm 

which  were  two  pointers  that  c=^bi  te  moved  along  a  caliMated  scale.  Differest  scales  cmiid 
be  mouTitei  on  toe  apfsiratos.  Each  suigect  was  as^d  p>  place  the  two  pointers  aimig  a  scale 
to  irdicate  his  certainty  (50%  or  S0%)  that  the  populatiffii  mean  lav  witoin  the  interval  he  set. 

4.1,7.  SCALES.  For  each  s^uence  subjects  iraiicated  toetr  credible  intervals  on  one  of 
four  scales.  Eacn  scale  was  calibrate  in  unit  intervals. 

Scale  1  ranged  from  -30  to  *^30.  It  was  used  crnijunctton  wtto  sequences  M  staiyard  de¬ 
viation  10  and  po^lation  mean  0  or  4.  &i^ects  estimated  i0%  cr^ible  intervals  on  tois  scale. 

Scale  2  ra-Hged  from  4-20  to  *#0.  It  was  used  for  sequences  of  which  tee  poptlaUan  mean  was 
eitoer  50  or  54  a.r.d  standard  denaaon  was  10.  Objects  esttmated  bote  BW  and  50%  cr^ibie 
intervals  using  tins  scaL. 

Scale  3  ranged  from  -15  to  It  was  usw  for  sequences  of  which  the  true  mean  was 
0  or  4  and  the  sta,ndard  deviatior.  vas  5.  Sabjects  estlmatel  90%  credible  intervals  on  tels  scale 

Scale  4  ranged  frcm  *35  to  «S5.  It  was  used  In  conjonctlon  Wste  sequences  of  staralard  de- 
VT  jtion  5  and  mean  of  50  or  54.  aibjects  esUmat«l  90%  credible  interTOls  using  teis  scale. 
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Table  VSI  summaristes  the  information  about  the  sequences  and  scales  used  in  the  expert* 
mental  design. 


TABLE  Vm.  SCALES  AND  SEQUENCES 


Secondary 

Sequence 

Original 

Sequence 

Standard 

Eteviation 

Mean 

Credible 

Interval 

Prior 

Setting 

Scale 

A 

1 

5 

0 

90% 

M  ±  8 

3 

B 

i 

5 

4 

90% 

Mt  8 

3 

C 

1 

5 

50 

90% 

Mt  8 

4 

D 

1 

5 

54 

90% 

Mx  8 

4 

E 

1 

10 

0 

90% 

M±  16 

1 

F 

1 

10 

4 

90% 

Mx  16 

1 

G 

1 

10 

50 

90% 

Mx  16 

2 

H 

1 

10 

54 

90% 

Mx  16 

2 

I 

2 

5 

0 

90% 

Mx  8 

3 

J 

2 

5 

4 

90% 

M  t  8 

3 

K 

2 

5 

50 

90% 

M  ±  8 

4 

L 

2 

5 

54 

90% 

Mx  S 

4 

M 

2 

10 

0 

90% 

Mx  16 

I 

N 

2 

10 

4 

90% 

Mi  16 

1 

0 

2 

10 

50 

90% 

M±  16 

2 

P 

2 

10 

54 

90% 

M  X  16 

2 

Q 

1 

10 

S) 

50% 

Mx  7 

2 

R 

2 

10 

50 

50% 

Mi? 

2 

S 

3 

10 

50 

50% 

Mi  7 

2 

8.1.8.  PROCEDURE.  Subjects  were  run  one  at  a  time  tor  five  experimental  sessions  of  one 
leuf  each.  They  saw  sequences  A  to  P  and  corapieied  their  90%  credible  Interval  estimations 
before  seelrsg  sequences  Q,  R,  and  S  and  making  50%  credible  interval  estimations.  Subjects 
saw  the  sequences  in  Uie  fa.*^om  orders  given  in  Table  VIII. 


4.2.  RESULTS 


The  widths  of  sul^ects*  esUmaied  credible  intervaiS  were  analyzed  by  comparing  them  to 
the  Bayesian  interval  width.  Bayesian  intervals  were  foui^  by  calculating  (3.29)  (s.d.j/v  N  for 
the  90%  credible  intervals  and  (1.348)  ls.d.)/\  N  for  the  50%  credible  intervals,  where  N  -  the 
number  of  Uie  sample  or  trial  numbers  irs  the  sequence.  Piots  of  the  comparisons  showed  no 
learning:  subjects  were  no  more  Bayesian  on  late  sequences  than  on  early  sequences.  Therefore, 
the  resuita  were  combined  over  all  sixteen  s^uences  *or  which  subjects  gave  90%  credible 
Intervals  :.nd  ever  the  three  stxiuences  for  which  subjects  gave  50%  credible  intervals  Since 
plots  showed  large  and  consistent  individual  differences,  results  were  not  combined  over  sub- 
jeetB  Figure  17  shows  the  results  of  this  analysis.  Only  Subject  Four  set  intervals  equal  to  or 
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FICUBE  .•7.  W«yrn  OF  CltE,Drr»E.E  I'KTEKFAU  AVERAGED  OVER  SEQUENCES 


4!) 


TRIAL  M  MBLH  TRIAL  MMHLR 

iv)  Subiect  Thrf>e  Subsect  Four 


smiUer  than  Bayesian  interval  widths  when  giving  90%  credible  Intervals,  and  only  Subjects  Three 
and  Four  gave  intervals  equal  to  Bayesian  intervals  when  setting  50%  credible  interval  widths. 

The  midpoints  of  the  intervals  set  by  subjects  were  used  as  estimates  of  what  the  subjects 
thought  the  mean  of  the  population  to  be  at  every  Uial.  Bayesian  means  were  found  by  calculating 

♦  xh)  (hg  *  1;),  where  -  tne  prior  mean,  =  the  prior  precision,  x  -  the  value  of  the  sam¬ 
ple,  and  h  -  the  precision  of  the  sampling  process.  Precisions  were  defined  as  the  reciprocal  of 
the  prior -distribution  variance  in  one  case  arxl  of  the  sampling-process  variance  in  the  other 
case.  The  absolute  deviations  of  a  subject's  means  from  the  Bayesian  means  were  found  and 
summed  at  every  trial  over  all  19  sequences.  Figure  18  displays  the  summed  deviations  from 
Bayesian  means  at  every  eighth  trial.  A  comparison  of  Fig.  18  with  the  widths  of  subjects’ 
estimated  credible  intervals  shows  that  there  is  a  correlation  between  the  subjects'  ability  to 
track  the  Bayesian  mea.n  and  tht  size  of  the  credib'e  intervals  they  set. 

4.-3.  DISCUSSION 

As  was  expected,  the  subjects  displayed  conservatism;  in  seven  of  the  ten  Instances  ex- 
amineo  they  did  net  reduce  their  interval  widths  by  an  amount  inversely  proportional  to  the 
square  root  of  N,  the  number  of  samples,  Ixit  more  slowly. 

However,  analyzing  the  data  of  the  experiment  pointed  to  problems:  (!)  the  subjects  might 
.not  have  distinguished  between  the  concepts  of  population  mean  and  sample  mean;  (2)  there  is  no 
reason  why  they  should  have  believed  that  the  numbers  .tisplayed  came  from  a  stationary  proc¬ 
ess;  (3)  only  four  population  nwans  were  used,  t'*'o  of  which  (0  and  50)  were  in  tlw  center  of  the 
scales  on  which  subjects  moved  their  pointers;  and  {41  at  the  beginning  of  oach  sequence,  the 
pointers  were  preset  by  the  experimenter  to  the  theoretical  size  within  which,  without  sampling, 

CHie  could  be  90%  or  50%  confident  that  the  population  mean  fell,  and  the  population  mean  was 
always  at  the  center  of  this  preset  interval. 
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FIGITIE  iS.  SUM  OF  ABSOLUTE  DElT/i TIONS  OF  SUBJECTS'  MEANS  FROM  BAYESIAN  MEANS 


5 

CONSERVATSM  IN  A  VERY  SIMPLE  PROBABILITY-ESTmATION  TASK 


In  the  first  experiment  we  have  reported,  subjects  were  told  tha*  the  environmeat  couls  be 
in  exactly  one  of  four  possible  states,  refers  :1  to  as  hypotheses.  A  sequence  oi  io  data.  t|er.er' 
ated  under  the  truth  of  one  hypothesis,  was  shown  to  the  subjects.  Ahei-  seetfij:  each  datum  tr, 
the  sequence,  the  subjects  estimated  how  probable  they  thought  it  was  tlun  each  oi  the  four  hy- 
potteses  was  the  true  one.  Their  estimates  were  compared  with  probabiiiUes  computed  from 
Bayes's  theorem. 

The  general  finding  of  this  study  was  that  the  subjects*  probability  estimates,  v.  hiie  highly 
reltabie,  were  considerably  more  conservative  than  those  calculated  from  Bayes's  theorem;  this 
led  to  the  postulation  that  this  conservatism  resulted  from  the  inteliectuai  difficulty  of  ccmbiniiig 
the  diagnostic  value  of  each  individual  datum  in  order  to  arrive  at  a  diagnosis  of  the  environment 
based  on  ail  the  available  data. 

bi  the  present  study,  we  hypothesized  that  the  consorvatism  could  be  reduced  or  even  eiimi- 
nated  by  decreasing  the  difficulty  of  the  original  task.  Irt  the  new  task,  only  one  o:  two  hypotheses 
could  be  true,  and  only  two  kinds  of  data  were  possible.  Thus,  sut^ects  were  presented  with 
seqiKnces  of  data  allowing  only  two  different  observations,  and  only  two  probability  estimates — 
one  for  each  hypothesis — were  required  after  subjects  saw  each  datum.  This  is  the  simples; 
possibie  task  requiri;^:  revision  of  opinion  as  new  information  is  presented. 

5.1.  METHOD 

5,1,!.  PROCEDURE.  Subjects  were  shown  one  bookbag  chosen  from  among  ten  bags  of  which 
all  were  equally  likely  to  be  chosen.  Each  of  the  ten  bags  contair#d  100  poker  chips,  some  red 
and  some  blue.  Every  bag  was  either  a  Type  R  bag,  in  which  red  chii«  predominated,  or  a  Type 
B  bag,  if!  which  blue  chips  predominated.  For  each  type,  the  preponderant  chips  were  in  propor¬ 
tion  p  while  the  nonpreponderant  chips  were  in  proportion  q.  Of  the  ten  bags,  r  were  of  Type  R 
amd  b  were  of  Type  B.  Subjects  were  toid  how  many  of  the  ten  bags  were  of  Type  R  and  how 
many  were  of  Type  B,  and  they  were  told  the  exact  proportions  p  and  q. 


■■  This  section  prepared  by  I*wrence  D.  Pfiillips  and  Ward  Edwards  on  the  basis  of  data 
collected  by  Richard  Norman. 
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Subjects  were  told  that  two  hypotheses  about  the  contents  of  the  closer,  bag  were  possible 
for  ihis  experiment: 

Hypothesis  R:  The  chosen  bag  was  Type  R. 

Hypothesis  B:  The  chosen  bag  was  Type  B. 

Next,  subjects  were  ashed  to  make  intuitive  estimates  of  the  probabilities  of  the  two  hypotheses. 
The  pt-oportion  r^'lO  v.?lli  be  called  the  theoretical  prior  probability  of  Hypothesis  R,  P(H^),  and 
ly'lO  will  be  called  the  theoretical  prior  probability  of  Hypothesis  D,  P(Ko).  If  the  subjects* 

Z> 

estimates  differed  from  the  theoretical  prior  probabilities,  the  experimenter  explained  that  lack 
of  other  information  made  *he  proportions  r' 10  and  b/IO  the  best  estimates  of  the  prior  proba¬ 
bilities.  This  procedure  ensured  that  ail  subjects  started  with  the  same  prior  probabilities. 

Twenty  chips  were  drawn,  one  at  a  time  and  with  replacement,  from  the  chosen  bag.  After 
each  draw,  subjects  revised  their  previous  intuith^e  estimates  of  the  probability  that  Bag  Type  R 
had  been  chosen  and  of  the  probability  that  Bag  Type  B  had  been  clKtsen.  This  process  of  select¬ 
ing  one  bag  at  random  from  ten  and  then  drawing  20  chips  from  the  bag  was  repeated  24  times; 
thus,  every  subject  made  20  pairs  of  estimates  tor  each  of  24  sequences.  The  correct  hypothesis, 
the  prior  probabilities,  and  the  proportion  of  pretkiminant  chips  differed  for  each  sequence,  as 
shown  in  Table  DC. 

Only  eight  different  basic  sequences  of  red  and  blue  chips  were  actually  shown  to  subjects, 
as  can  be  seen  in  Table  DC.  Sequences  are  apfarentiy  difficult  to  remember;  no  subject  reported 
noticing  the  repetition  of  sequences.  These  sequences  were 


1. 

F^S 

SSFS 

FSSSS 

FSSSF 

2. 

raFSF 

SSS^ 

SSFSS 

FSSSF 

3. 

FFSSF 

FSFSS 

S^F 

SSSSS 

4. 

^FSF 

SSSSS 

SFSSF 

FFFSF 

5. 

SSFFF 

sssss 

SS^ 

SSSFS 

6. 

SF^ 

FSFSS 

sssss 

FSSSS 

7. 

SSFSS 

^F5 

^FF 

FSFFS 

8. 

FKSF 

FSFSS 

FSSSS 

FSSFS 

The  letters  S  and  F  denote  "success"  and  '  failure”,  where  a  success  is  (tefined  as  tho  drawing 
of  a  chip  with  the  same  color  as  the  predominant  chips  in  the  chosen  bag,  and  a  fai'ire  is  the 
drawing  of  a  chip  of  the  other  color.  The  symtaoi  for  probability  of  success  is  p,  and  that  for 
probability  of  failure  is  q,  and  p  *  q  “  1. 

Sequences  were  presented  to  subjects  in  raixkim  order,  six  sequences  per  session.  Each 
session  lasted  for  about  an  hour.  Subjects  were  run  individually  and  were  self-faced.  Subjects 
were  never  told  anything  about  the  quality  of  their  estimates  nor  were  they  told  which  hypothesis 
was  correct  for  a  given  sequence. 
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TABLE  EX.  EXPERIMENTAL  DESIGN 


Sequence 

No. 

Correct 

Hypothesis 

P(H^) 

P 

Basic 

Sequence 

i 

30 

.6 

1 

2 

30 

.6 

3 

3 

Hr 

13 

.6 

2 

4 

''b 

40 

.6 

4 

5 

50 

.6 

3 

6 

«B 

50 

.6 

1 

7 

50 

.6 

4 

8 

Hb 

50 

.6 

2 

9 

«B 

60 

.6 

2 

10 

“b 

60 

.6 

4 

tl 

“r 

70 

6 

1 

12 

“b 

70 

.6 

3 

13 

-R 

30 

.1 

6 

14 

"b 

30 

.7 

8 

15 

«R 

40 

.7 

5 

16 

40 

.7 

7 

17 

«R 

50 

.7 

8 

18 

»B 

50 

,7 

6 

19 

«R 

50 

.7 

5 

20 

«B 

50 

.7 

7 

21 

«R 

60 

.7 

7 

22 

«B 

60 

5 

23 

Hr 

70 

,7 

6 

24 

Hr 

70 

.7 

8 

5? 


5.1.2.  SUBJECTS.  Five  mdles.  undergraduates  of  The  U  uversuv  of  M  liS 

Subjects  They  were  paid  Si. 25  per  hour 


Theoreticai  probabilities  for  each  sequence  can  be  calculated  from  Bayes's  theorem: 

P(H^  D)  =  k  PfD.H^j  PCH  1 

K  H  n 


(1) 

(2) 


P(Hg.D)  =k  P{D  Hg)  P(Hg) 

P{Hg)  and  P(Hg)  represont  the  prior  probabilities  of  the  correct  hypothesis;  PfHg  D)  and 

P(Hg.D)=  the  j»sterior  probabi  ities.  or  tte  proi  abilities  of  the  hypotheses  after  observing  the 

datum  D;  and  P{D  =  H_)  and  P(D-.  H„),  the  iiicelihoods  of  the  datum  or  ite  conditional  probabilities 
H  o 

of  the  datum  given  the  truth  of  the  particular  hypothesis.  A  normaiizlng  constant  k  ensures  that 

HHg.D)  *  PfHg.D)  =  I 


A  form  of  Bayes's  theorem  more  convenient  for  aiialyzing  the  data  can  be  oUained  by  di¬ 
ng  Eq.  1  by  Eq.  2  whenever  is  ti 
is  the  correct  hypothesis.  This  gives, 


viding  Eq.  1  by  Eq.  2  whenever  is  the  correct  hyp<Khests,  and  Eq.  2  tfv  Eq.  1  whenever  Hg 


Hi  - 


(3) 


where  represents  the  posterior  odds  in  favor  of  correct  hypothesis;  R^,  ti»  prior  -odds 
in  favor  of  tte  correct  hypothesis,  and  L,  tte  likelihood  ratio  of  the  dau. 

Since  each  draw  of  a  cmp  is  generated  by  a  binomial  process,  with  probability  of  success 
equal  to  p,  the  proisbllity  of  getting  s  successes  in  n  drawlkijs  proportional  to  p*q''  Thus, 
the  likelihood  ratio  of  t.he  datum  is 


8  n-s 

P  q 
s  n-s 
q  P 


2s-n  ,  „x2S’n 
P _  .  /P\ 


(4) 


Of  course.  2s  -  n  =  s  -  (n  -  s)  =  s  -  f  ts  the  difference  between  the  number  of  successes  -t.nd 
failures,  so  Eq.  4  car.  be  written 

f  P>-* 


L  = 


(5) 


Rewriting  Eq.  5  in  iog  form  gi  es 


log  L  =  (s  -  f)  li« 


This  form  is  conveniem  because,  for  giver,  valaes  of  p  and  q,  log  L  varies  ilnearly  wuh  s  -  f. 
Figure  19  snows  a  plot  of  log 
of  p  used  in  this  experiment. 


Figure  19  snows  a  plot  of  tog^^L  as  a  function  of  s  -  f.  Two  plots  are  shown,  one  for  each  value 
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The  log  likelihood  ratios  computed  from  Eq.  6  and  shown  in  Fig.  18  are  theoretical  values. 
Log  likelihood  ratios  inferred  from  subjects*  estimates  can  also  be  computed  and  compared  to 
the  theoretical  values.  First,  subjects'  estimates  vers  converted  to  posterior  odds.  Then, 
since  the  prior  probabilities  were  given,  inferred  likelihood  ratios  can  be  calculated  from  this 
logarithmic  form  of  Eq.  3; 

log  L  =  log  fij  -  log  (7) 

Plotting  subjects'  likelihood  ratios  as  a  function  of  Bayesian  likelihood  ratios  allows  actual  per¬ 
formance  to  be  compared  with  theoretical  performance.  This  has  been  done  in  Fig.  20  for  Sub¬ 
ject  One,  for  the  data  obtained  in  sequences  with  p  =  .7.  Plots  for  data  obtained  when  p  =  .6 
gave  nearly  identical  results,  so  are  not  shown  here.  The  scatterplots  of  all  subjects  except 
Sibject  Four  were  similar  to  those  of  Subject  One;  Subject  Four's  stow  greater  scatter. 

Another  way  to  summarize  these  data  is  to  determine  what  bookbag  compositions  would  be 
necessary  for  Bayes's  theorem  to  give  probabilities  identical  to  those  estimated  by  the  subjects. 
This  has  been  done  graphically,  and  the  results  are  given  in  Table  X. 

TABLE  X.  RAI«5E  OF  p  VALUES  THAT  WILL  YIELD 
BAYESIAN  PERFORMANCE  IDENTICAL 


TO  SUBJECTS’  ESTIMATES 

True  Value  of  p 
Subject  .6 

1 

.51-.55  .50-.55 

2 

.50-.54  .50-.56 

3 

.b2-.56  .51-.59 

4 

.50-60  ,  50-. 69 

5 

.50-. 53  .50-. 54 

For  example,  the  data  generated  by  Subject  One  when  he  saw  a  70-30  bookbag  could  have  been 
generated  by  Bayes's  theorem  using  values  of  p  which  ranged  from  .51  to  .35. 

5.3.  DISCUSSION 

Despite  the  s.mpiicity  of  this  task,  subjects*  estimates  were  still  conservative,  cominred 
to  probabilities  computed  from  Bayes*8  theorem.  A^iarently,  the  conservatism  found  in  &(peri~ 
ment  One  is  not  entirely  caused  by  the  complexity  of  that  task. 

Table  IX  Indicates  that  the  amount  of  conservatism  is  very  little  affected  by  the  two  values 
of  p  in  this  experiment.  Aissibly  this  is  caused  by  presenting  sequences  in  ranctom  order.  If 
all  the  .6  sequences  had  been  presented  together,  and  ail  the  .7  sequences  togeti^r,  perhaps  the 
inferred  likelihood  ratios  would  have  differed  more. 
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Filsaily,  four  of  the  five  subjects  show  considerable  consistency,  as  is  indicated  by  the  low 
degree  of  scatter  in  their  scatterpiots.  Behavior  in  this  simple  task  can  best  be  descrited  as 
reliable  and  consistent,  Iwt  very  conservative  when  compared  to  Bayes's  theorem. 

A  very  simple  model  gives  a  good  fit  to  these  data.  It  supposes  that  the  subject  raises  the 
likelihood  ratio  to  a  power  less  than  one  before  performing  the  arithmetic  of  Eq.  3;  it  is  equiv’.- 
ient  to  saying  that  te  behaves  as  though  the  bookbags  are  nearer  50-50  than  they  are.  While 
this  model  is  far  too  crude  to  be  plausible,  it  fits  these  data  as  well  their  scatter  permits. 
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RESPONSE  MODES  AM.’  PROBABIUTY  ESTIMATION* 

Previous  research  (see  preceding  sections)  has  repeatedly  demonstrated  tliat  subjects 
exhibit  suboptimal  behavur  when  processing  probabilistic  information,  with  Bayes's  theorem 
providing  the  standard . 

For  the  first  experiment  (Section  2)  a  pseudomilitary  game  was  presented  to  subjects  who 
viewed  the  progressive  accumulation  of  Impact  points  on  a  display  that  resembled  a  radar  dis¬ 
play  (PPI),  and,  on  the  basis  of  these  data,  made  posterior  probability  estimates  about  the  truth 
of  four  hypotheses.  The  subjects  consistently  underestimated  lilgh  probabilities  and  overestimated 
lo*  probabilities;  they  were  unable  to  extract  from  the  information  ail  the  certainty  about  the 
truth  of  the  hypotheses  dial  was  justifiable  by  Bayes's  theorem. 

Section  5  reports  a  much  simpler  task  involving  Bayesian  inference.  Despite  the  simplicity 
of  the  task,  subjects  were  also  unwilling  to  commit  themselves  to  extreme  probability  estimates. 
Tasks  for  which  posterior  Bayesian  probabilities  were  greater  than  0.999  elicited  from  subjects 
estimates  between  0.80  and  0.90. 

This  conservatism  seems  sufficiently  certain  to  permit  investigation  into  the  effects  on  it  of 
other  variables.  L.  D.  Phillips  (unpublished)  employed  the  same  bookbai  aiwl  poker  chip  problem 
bat  explored  the  effect  of  making  payoffs  to  the  subjects  contingent  upon  the  accuracy  of  their 
posterior  probability  estimates.  Four  groups  were  run,  a  controi  with  no  payoff,  and  three 
payoff  groups  in  which  the  payoffs  had  either  a  logarithmic,  quadratic,  or  linear  relationship  to 
Uie  probability  estimates.  All  subjects  were  more  conservative  than  Bayes's  theorem;  low 
probabliities  were  overestimated  and  high  ones  were  underestimated.  The  logarithmic  and 
linear  payoff  groups  were  more  accurate  in  their  estimates  than  the  control  group.  For  some 
reason,  however,  the  performance  of  the  quadratic  p,syoff  group  fell  below  that  of  the  control  group. 

The  major  purpose  of  the  present  study  is  to  investigate  the  relative  effects  on  performance 
of  various  probability -estimation  response  modes. 

6.1.  METHOD 

6.1.1.  SUBJECTS.  The  subjects  were  15  male  students  of  The  University  of  Michigan 
randomly  assigned  to  one  of  the  three  experimental  groups;  PR,  VO,  and  ODD.  TTiose  in  Group 
WR  made  their  estimates  by  distributing  lOO  washers  over  two  pegs,  which  forced  them  to  nor- 
tnalire  their  probability  estimates,  aibjects  in  Grcxip  VO  reported  their  estimates  In  verbal  odds 

*This  section  was  pre{»red  by  Mary  Ann  Price  Swain  and  Ward  Edwards. 
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in  fa\?oi  of  the  mo-^t  likely  oag.  And  subjects  in  Group  ODL  made  tiicir  estimate?  by  soiling  a 
pointer  along  a  scale  on  'A’htch  od^s  were  displayed  in  logarithmic  tuterveis; 


FIGURE  21.  logarithmic  SCALE  FOR  SUBJECTS’  REGISTERING  OF  PROBABlLn’Y  EfTIM^^TES 

6.1.2.  PROCEDURE.  Hiis  experiment  used  Uie  bookbag  and  poker  chip  paradign-. 
explained  above.  Subjects  were  run  one  at  a  time,  and  each  was  run  in  two  different  ejtperim  “ntai 
sessions.  The  first  session  utilized  10-30  bookbags  and  the  second  session  60-40  bookbagfi.  All 
bags  had  a  prior  probability  of  O.S.  At  each  session  the  subject  was  shown  six  different  23-chip 
sequences.  Sequences  were  generated  randomly  and  checked  by  the  experimenter  for  their 
"representativeness.  "  Retained  sequences  always  favored  the  correct  point  hypothesis  over  the 
uniform  hypothesis  {l.e,,  that  ail  compositions  are  eqaaliy  likely);  this  requirement  is  satisfied 
if  {n  +1)  "  p)”"**  -  1.  where  p  represents  the  probability  of  obtaining  a  chip  of  the  pre¬ 

ponderant  color  from  the  chosen  bookbag;  n,  the  total  number  of  chips  drawn;  and  s  the  number  of 
those  chips  drawn  that  art  of  the  color  predominant  in  the  bag.  Retained  sequences  also  satisfied 
the  Wald-Woifowitz  test  for  the  ex|»cted  number  of  runs  {alternation  of  colors)  in  a  given  se¬ 
quence  of  s  preponderant  elements  and  n-s  nonpreponderara  elemertfs. 

Sequences  were  drawn  and  recorded  ahes^  of  time.  Ekjrlng  the  experimental  session,  the 
experimenter  presented  the  subject  with  the  chips  as  ii  he  wc-^e  actually  drawing  them  from  a 
bookbag.  Each  s-itgect  saw  the  same  seque.nces,  although  not  i.i  the  same  order.  They  were 
required  to  make  an  estimate  after  each  draw  of  the  sample;  they  were  never  told  which  was 
the  correct  hypothesis,  nor  were  they  given  any  feedback  about  the  accuracy  of  their  estimates. 

6.2.  RESULTS 

(For  reasons  that  will  be  given  later,  the  60-40  data  failed  to  yield  any  consistent  results. 
Therefore,  the  analyses  to  be  presented  here  pertain  only  to  estimates  made  in  the  70-30  problem.) 

The  logarithmic  odds -likelihood  ratio  form  of  Bayes's  theorem  is  convcrlent  for  data  anal¬ 
ysis  since  it  makes  optimal  performance  af^ar  linear.  (This  statement  is  fully  explained  under 
■'Results”  in  Section  5.)  Remember  that  this  form  is- 

log  L  =  log  Rj  -  log  Rq 
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where  L  represents  the  likelihood  of  the  datum;  the  odds  before  observing  u'Mt  datum;  and 
,  the  odds  after  observing  the  datum.  If  we  assume  that  the  subjects  have  based  their  esti- 
mates  on  the  values  of  the  variable  s  -  f,  then  we  can  compute  an  inferred  liklihood  ratio  for 
each  subject  by  translating  each  posterior  estimate  into  its  logarithm  and  subtracting  the  log  oi 
the  prior  odds.  Figures  22-25  are  tyj^cai  scatterplots  of  subjects’  inferred  log-iitelihood  ratio. 
The  broken  line  Is  the  best-fitting  regression  line  that  passes  through  the  origin.  For  all  sub¬ 
jects,  the  regression  lines  deviate  markedly  from  the  line  representing  perfectly  Bayesian  per¬ 
formance.  Table  5fl  summarizes  both  group  and  i.ndividuat  performances,  in  the  table,  m  is  the 


TABLE  :n.  SLOPE  CONSTANTS,  CORRELATION  COEFnCIENTS, 
AND  k  VALUES  FOR  EA(::H  SUBJECT  AND  GROUP 


Group 

m 

-m 

r 

k 

PR 

■gST' 

I 

.062 

.829 

.169 

2 

.094 

.417 

.254 

Subject 

3 

.116 

.927 

.314 

4 

053 

.836 

.145 

5 

.084 

.799 

.228 

VO 

.114 

.665 

.310 

1 

.{»3 

f2~ 

.225 

2 

.076 

M2 

.207 

aifaject 

3 

.222 

.573 

.603 

4 

.216 

.847 

.587 

5 

.117 

.945 

.318 

ODl 

.127 

.599 

.345 

1 

.099 

.796 

.268 

2 

.113 

.958 

.307 

&jbje<  t 

3 

.064 

.677 

.173 

4 

,278 

.842 

.756 

5 

.281 

.976 

.764 

slope  of  the  regression  line,  r  is  t*-  ‘  measure  of  correlation  between  the  s  -  f  value  and  the 
inferred  iog-iikelthood  ratio,  and  k  ,s  the  constant  by  which  one  multiplies  the  sic^  of  tte 
^yes’s  theoretical  line  (log  p/q)  to  obtain  the  subject’s  slope  (m). 

Table  XI  shows  that  response  m<xies  do  affect  performance.  The  odds  grmips  are  bath 
superior  to  the  probability  estimation  group.  Furthermore,  the  ODL  group  is  slightly  superior 
to  the  VO  group. 

Another  way  to  analyze  these  data  is  to  calculate  the  percentage  of  improvement  in  per¬ 
formance  shown  by  the  two  odds  grou|f»  over  the  probability  estimation  group.  Figure  26  iihis- 
trates  that  by  the  third  draw  the  VO  subjects  were  43  percent  more  accurate  than  the  PR  subjects 
and  the  ODL  subjects  were  60  percent  more  accurate  As  evidence  accurouiates  ali  subjects 
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✓ 
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FIGURE  26.  PERCENT  OF  IMPROVEMENT  SHOWN  BY  VO  AND  ODL  GROUPS  OVER  PR  GROUP  IN  ACCURACY  OF 

ESTIMATION 


shouid  increase  their  certainty  about  the  trutl-  of  a  hypothesis;  consequently,  by  the  20th  draw  th^ 
differential  performance  of  the  groups  was  reduced.  At  this  point,  the  VO  and  ODL  groups  were 
only  22  and  24  percent  more  accurate  in  their  estimates. 


6.3.  DISCUSSION 


This  study  reconfirms  the  finding  that  subjects  are  conservative  in  situations  involving 
inference  from  fallible  information,  and  are  unable  to  extract  from  information  ail  the  justifia’  x- 
certainty  about  the  truth  of  a  hypothesis.  The  roughly  linear  scatterpiots  in  Figs.  22-24  are 
characteristic  of  the  majority  of  the  subjects.  Occasionally  (Fig.  25)  a  subject  will  exhib-'  great 
variability  in  his  estimates.  In  this  case,  the  subject  told  the  experimenter  that  he  had  changed 
his  strategy  in  the  middle  of  the  session.  A  simple  model  to  describe  such  a  subject's  be- 
havior  is 

log  L'  =  k  log  L 

where  L‘  represents  the  subject's  inferred  likelihood  ratio;  the  values  of  k  are  shown  in  Table  XI. 


Subjects  are  Bayesian  information  processors,  but  they  raise  every  bkel.bood  ratio  to  a 
power  less  than  one.  Another  way  of  describing  this  mode!  is  that  subjects  behave  as  diough  they 
do  not  believe  the  experimenter’s  statement  about  the  composition  of  the  bookltogs.  PR  subjects 
behaved  as  if  they  thought  the  bags  were  of  a  55-45  composition;  VO,  56-44;  aiMi  ODL,  57-43, 

In  short,  subjects  degrade  tlie  environment  in  a  consistent  way. 

If  subjects  are  hesitant  to  commit  theraseives  to  extreme  estimates,  then  one  wouki  expect 
the  performance  of  those  who  estimate  odds  to  be  superior  to  that  of  those  who  estimate  prob- 
abiiities,  because  probabilities  have  an  upper  limit  of  1.60.  Thus,  as  PR  subjects  inc -  ease  their 
estimates  they  also  reduce  the  upper  range  of  responses  remaining  to  them.  Odds  do  not  have 
this  upper  limit.  Therefore,  it  is  easier  for  the  VO  and  CH}L  groups  to  make  larger  estimates 
since  they  always  have  an  unlimited  range  of  estimates  stiil  available.  Moreover,  the  visual 
logarithmic  display  of  odds  further  facilitates  making  large  estimates. 

I^illips  found  that  paying  subjects  for  accuracy  tended  to  enhance  their  performance.  Hie 
results  of  this  experiment  suggest  that  subjects  should  estimate  posterior  odds  rather  than  pos¬ 
terior  probabilities  in  an  information-processing  task.  It  would  be  ctxivenient  if  the  positive 
effects  of  payoffs  and  cxkis  combined  ackiitively  to  influence  total  performance.  That,  however, 
is  an  experinienial  question  to  be  explored. 

The  data  from  tne  60-40  sequences  were  not  analyzed  for  the  followmg  reasons;  three  out 
<rf  fi\-e  subjects  in  the  VO  group  and  four  out  of  five  in  the  ODL  group  gave  as  their  odds  estimates 
the  ratio  between  the  number  of  red  chi{^  and  the  number  of  blue  chips  presented  to  them. 
Secondly,  one  subject  in  TO,  two  subjects  in  VO,  and  one  sui^ect  In  ODi,  toki  the  experimenter 
that  they  fell  confused  in  the  60-40  case  since  they  were  still  thinking  of  70-30  bookixigs.  The 
data,  consequentiy,  are  ambigwus  and  difficult  to  interpret. 
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This  study  should  be  repeated,  employing  more  than  one  Bernouiiii  probability  for  the 
bookbags.  Since  subjects  behave  as  if  the  composition  of  the  70-30  bookbags  were  in  the  ncin- 
ity  of  55-45,  it  would  be  of  interest  tr=  see  If  they  are  more  nearly  Bayesian  information  pro- 
cessors  when  the  actual  bcokbag  composition  is  55-45.  A  more  extreme  p  value  should  be  chosen 
(0.85  or  0.9)  in  order  to  see  If  the  differential  performance  of  the  two  odds  groups  is  maintained. 
Asymmetric  bookbags  would  further  test  various  response  modes.  In  any  case,  careful  controls 
should  be  exercised  to  Insure  that  subjects  do  not  confuse  an  odds  estimate  with  the  sample 
ratio  of  red  chips  to  blue  chips. 


Appendix  A 

INSTRUCTION'S  TO  SUBJECTS 

Supp'jse  you  «tre  m  the  Ait  Force  and  stationed  at  one  of  their  radar  deiectior.  statiors-  a. 
Greenland  These  stations  have  large,  powerfal  radars  that  detect  many  types  of  aerial 
activity  —  ICBM's,  rockets,  planes,  clouds  --  sonietinies  ever:  birds.  All  of  these  things  may 
show  up  DP.  the  display  —  the  radar  scope.  Unfortu.niUeiy  by  the  time  they  are  displayed  they 
may  look  alike  —  little  spots  of  light  on  a  dark  background.  Obviously,  you  have  a  problem  if 
you  happen  to  have  the  job  of  sitting  at  one  of  these  scopes  and  trying  to  figure  out  what  are 
enemy  ICBM's  and  what  are  birds.  Fortunately,  the  problem  isn’t  hopeless.  For  instance, 
in  the  example  just  given,  the  ICBM’s  versus  the  birds,  iCBM  spots  would  oovlousiy  move 
faster  than  birds. 

You’re  ncs  here  so  we  can  train  you  to  t>e  a  good  radar  operator  in  ease  you  should  ever 
find  yourself  in  Greenland:  however,  the  series  of  experiments  in  which  you  are  about  to  par- 
ticipate  does  concern  the  problem  of  evaluation. 

Although  the  information  presented  to  you  will  be  in  simplified  form,  the  basic  elemerss 
of  the  problem  will  be  very  similar  to  an  actual  situation,  Y’ou  will  play  the  part  of  an  evalua¬ 
tor:  it  will  be  your  job  to  decide  amor,g  four  possible  types  of  airborne  activity  (POINT  TO 
CONSOLE):  enemy,  frierealy,  meteor,  or  spx)..  Enemy  activity  may  be  of  any  sort,  an  ICBM 
or  rocket,  for  example.  For  the  purpose  of  this  e.xi«riment  the  specific  type  of  enemy  threat 
IS  not  important.  Frieridiy  activity  may  also  be  of  any  sort.  Meteors  are  self-explanatory. 

A  spoof  is  a  diversionary  or  probing  activity  by  the  enemy,  like  tne  cowtoy  hero  who  throws 
his  hat  in  the  air  to  see  what  the  bad  guys  will  do  about  It. 

You  are  sealed  at  the  output  display  of  a  cor  plex  detection  system.  This  detection  system 
covers  a  large,  circular  area  that  will  oe  subdivided,  for  this  problem,  into  sectors.  This 
area  wUl  be  displayed  .here.  (TURN  ON  SECTOR  DISPLAY  — A  SLIDE  WITH  NO  IMPACT 
POINTS). 

Aerial  activity  is  detecied  by  means  of  a  powerful  radar  system,  radar  i-nforniation  o.n 
detected  targets  is  fed  to  a  computer  that  determines  the  courses  and  speeds  of  the  targets 
and  i.he  paths  they  are  following.  For  this  experiment,  it  will  be  assumed  that  the  ccnirses  and 
speed-s  of  the  targets  do  not  charge  once  detection  is  made.  Once  the  courses  and  speeds  and 
the  paths  of  the  targeia  are  determined,  the  computer  determines  where  the  targets  will  iar,d. 
These  points  of  impact  will  be  displayed  on  t.he  console  within  one  of  the  sectors  of  t.his  land 
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area.  Since  we  obviously  don't  really  have  a  radar  here,  a  35-mm  slide  projector  projects 
this  display  from  the  back  of  the  console.  (DISPLAY  SLIDE  WITH  SEVERAL  IMPACT  POINTS). 
To  simplify  this  experiment,  we  have  not  put  any  dimensions  on  this  circle  of  land  area:  just 
consider  i*  as  a  land  mass  on  which  points  of  impact  are  displayed.  Remember,  computed 
impact  points  are  being  displayed  here,  not  the  radar  targets  themselves. 

For  each  experiment  you  will  be  shown  fifteen  slides.  In  some  experiments  the  number  of 
impact  points  will  increase  wUh  each  successive  slide.  In  others,  the  number  of  impact  points 
will  change  erratically  with  each  successive  presentation.  For  both  of  these  types  of  experi¬ 
ments,  the  impact  points  on  one  slide  are  all  of  the  same  type  of  activity.  Thus,  regardless  of 
whether  there  are  three  or  thirteen  impact  points  displayed  here  by  one  slide,  they  all  repre¬ 
sent  the  same  type  of  activity,  that  is,  they  are  all  enemy,  or  all  friendly,  or  all  meteors,  or 
all  spoof,  not  a  combination.  However,  the  two  types  of  experiments  differ  in  this  respect:  in 
the  series  where  the  number  of  impact  points  successively  increases,  the  activity  represented 
on  one  slide  is  the  same  as  for  the  previous  slide.  For  the  erratic  series,  each  slide  of  the 
fifteen  may  represent  activity  different  from  that  on  the  previous  slide. 

To  summarize  then,  there  are  two  types  of  experiments  in  which  you  will  be  Involved.  In 
one  type  you  will  first  be  shown  one  computed  impact  point  (SHOW),  then  one  more  (SHOW),  then 
another  (SHOW),  and  another  (SHOW),  and  so  on  until  fifteen  presentations  (SHOW)  have  been 
made.  The  impact  points  on  a.ny  one  of  these  slides  represent  all  the  same  activity  and  the 
activitj  represented  by  each  slide  is  the  same  as  that  on  the  previous  slide.  Thus  each  and 
all  of  these  slides  just  shown  may  have  represented  friendly  activity.  In  the  other  type  of 
experiment,  first  you  may  be  shown,  for  example,  three  impact  points  (SHOW)  representing  a 
single  kind  of  activity.  The  next  slide  may  have  eleven  impact  points  (SHOW),  again  all  of  the 
same  activity.  However,  the  activity  represented  by  this  slide  may  be  different  from  that  of 
the  previous  slide.  Thus,  the  previous  slide  of  three  impact  points  may  have  represented  enemy 
activity,  while  this  one  represents  meteors.  Fifteen  presentations  will  be  made  for  this  type 
of  experiment,  also. 

Before  you  begin  each  experiment,  you  will  be  told  whether  the  displayed  impact  points 
represent  changing  activity  or  the  same  activity.  Incidentally,  slides  in  both  experiments  will 
be  of  the  type  you  see  here,  that  is,  white  Impact  points  on  a  black  background.  Are  there  any 
questions  on  what  is  to  be  displayed? 

It  will  be  your  problem  to  decide  which  of  the  four  types  of  activity  is  beicg  displayed  by 
the  computed  impact  points.  To  help  you  in  this  evaluation,  five  pieces  of  information  will  be 
given  to  you. 

First,  we  will  assume  that  through  advance  intelligence  you  have  some  estimation  of  how 
likely  an  enemy  attack  may  be.  We  will  limit  the  experiment  to  three  possible  estimations; 


66 


1-in-lO  chance  of  enemy  attack,  l-in-4  chance,  or  2-in-3  chance.  That  is,  you  will  be  told  th^ 
there  is  either  a  10^  likelihood  of  enemy  attack,  or  a  25?  likelihood,  or  a  67'^  likelihoo'i. 

(SHOW  BASE  RATES,  INSERT  25?).  The  second  piece  of  information  will  give  you  an  idea  of 
where  an  enemy  impact  point  is  likely  to  be.  (INSERT  ENEMY  DISPLAY).  This  display  shows, 
in  percentages  and  in  pie  diagrams,  what  probability  there  is  that  an  enemy  missile  wilt  land 
in  any  one  of  the  sectors.  Here,  the  probability  is  highest  in  this  25?  sector  and  lowest  in  this 
2?  sector.  In  other  words,  if  the  impact  points  are  those  of  an  enemy,  they  are  more  likely  to 
show  up  in  the  sectors  with  the  higher  numbers,  or  with  the  bigger  pie  slices.  The  third,  fourth, 
and  fifth  pieces  of  information  are  similar  displays  foi  riendly,  meteor  and  spoof  activity, 
(INSERT  THEM  WHILE  EXPLAINING).  You  will  notice  that  there  is  a  rough  pattern  to  each 
of  these  possible  types  of  activity.  (POINT  TO  PATTERNS).  Enemy  attack  generally  would 
come  from  this  direction;  friendly  activity  would  more  likely  be  concentrated  in  this  area; 
meteors  would  probably  be  found  here;  spoof  activity  would  tend  to  be  in  this  area. 

One  Important  point  should  be  mentioned  now.  Although  the  pie  diagrams  are  shown  near 
the  center  of  each  sector,  the  percentage  each  represet^s  applies  evenly  to  the  whole  sector, 
(POINT  TO  5?  SECTOR).  In  other  words,  this  5?  value  applies  evenly  to  this  whole  sector. 

Thus  the  dividing  line  between  sectors  represents  a  sharp  change  in  likelihood;  there  is  no 
gradual  shading  from  one  likelihood  to  another.  Remember,  then,  each  sector  is  of  conMairt 
likelihood. 

In  summary,  you  will  evaluate  the  type  of  activity  represented  by  a  set  of  impact  points. 

Five  pieces  of  information  will  be  available  to  use  as  you  desire:  the  likelihood  of  enemy  attack; 
the  likelihood  that,  if  friendly  activity  is  being  observed,  the  c.jmpiked  impact  poirks  would 
appear  in  certain  sectors;  and  similarly  for  meteors  and  spoofs.  You  will  make  an  evaluation 
after  the  display  of  each  slide.  Thus,  for  one  experiment,  you  will  make  fifteen  evaluations. 
(CHANGE  TO  BLA.NK  SLIDE). 

Your  decisions  wUi  be  made  with  the  levers  on  the  console.  The  numbers  to  the  left  trf 
each  lever  indic^e  your  estimates  of  i.he  likelihood  that  the  impact  points  represent  the  cor¬ 
responding  type  of  activity.  The  lower  end,  near  zero,  represerts  very  low  likelilKXid,  the 
upper  end,  near  one,  represents  very  high  likelihood.  If  you  set  the  ENEMY  lever  to  .6  (SET 
LEVER)  this  means  you  estimate  that  there  is  a  60?  probability,  or  likelihood,  that  the  impact 
points  shown  here  represent  enemy  activity,  (RETURN  LEVER  TO  ZERO).  After  the  first 
slide  has  been  displayed,  make  your  evaluation  of  the  type  of  target  represei^ed  by  the  impact 
point,  or  points.  Indicate  your  probability  estimates  by  moving  the  levers  to  the  a^ropriide 
levels. 

For  instance,  If  you  moved  the  levers  to  ,6,  .1,  .25,  and  .05  (MOVE  LEVERS  ACCORDINGLY), 
this  would  indicate  that  you  believe  that  there  is  a  60?  probability  that  the  impact  points 
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represeitt  enemy  activity,  10^  probability  they  are  friendly,  25%  probability  they  are  meteor, 
and  5%  probability  they  are  spoof. 

Let's  look  at  what  I've  just  said  from  a  little  different  point  of  view.  Before  you  start  an 
ejqjerimeit,  the  best  estimate  we  have  of  the  probability  of  enemy  attack  is  this  advance  intel¬ 
ligence  st^emeitf.  Ail  we  know  is  that  there  is  a  25%  probability  that  enemy  missiles  will 
appear.  Additionally,  this  display  (POINT  TO  ENEMY  DISPLAY)  telis  us  that  if  the  enenay 
.tacks,  his  missiles  are  likely  to  fall  in  this  way,  and  similarly  for  the  other  three  types  of 
activity. 

So  you  see,  we're  dealing  with  three  types  of  probability  estimates.  One  is  given  before 
the  experiment  starts:  it  is  a  statement  of  what  to  expect.  Another,  shown  on  these  cards, 
(POINT  TO  P(DiH)  DISPLAYS),  says  'if  it  happens,  the  impact  polms  are  likely  to  ,.fU  like 
this,  (POINT  TO  ENEMY  DISPLAY),  and  if  it  doesn’t  happen,  the  impact  points  are  ’ikely  to  fall 
like  this  (POINT  TO  ANT  OTHER  DISPLAY)."  And  the  third  is  your  e-stirnaie  of  whsi  in;  a 
actually  happening. 

Now,  are  there  any  questions  so  far? 

The  console  is  operated  by  this  white  button.  When  the  green  light  is  lighted,  pushing  the 
bikton  will  cause  the  display  here  to  be  revealed.  1  have  already  done  this.  Then,  you  make 
your  evaluation  and  set  the  levers.  When  you  are  finished  push  the  button  —  go  ahead,  try  u. 
The  red  light  comes  on,  indicating  that  the  lever  settings  are  being  recorded  or,  a  special  re¬ 
corder  behind  the  console.  You  mustn't  move  the  levers  while  the  red  light  is  on.  When  the 
lever  settings  have  been  recorded,  the  yellow  light  comes  on.  This  is  a  signal  for  you  to  reset 
the  levers  to  zero.  Try  it.  When  they  are  all  reset,  the  green  light  corres  on.  If  the  yellow 
light  stays  on,  check  the  position  of  ail  four  of  these  levers  again,  as  well  as  these  extra  two. 
The  zero  point  i*-  quite  sensitive,  and  sometime.*?  the  levers  are  jarred  off  this  position. 

As  soon  as  the  green  light  comes  on,  you  can  push  the  button  again  "o  reveal  the  new  ixa- 
pact-point  display.  Now  try  the  sequence  for  yourself.  Make  a  meaningless  evaluation.  (WHEN 
GREEN  LIGHT  COMES  ON,  STOP  SUBJECT).  Notice  that  if  you  accidentally  movt  <>ne  of  the 
levers  off  the  zero  position  before  a  new  slide  comes  on,  the  green  light  will  blink.  Resetting 
the  offending  lever  will  cure  the  situation. 

Finally,  you  don’t  have  to  coutk  the  number  of  slide.s  in  the  experiraent.  The  screen  will 
show  all  black  when  you  are  finished.  (TURN  ON  BLANK  SLILE).  Whe.r-  this  hapiiens,  let  me 
know.  I'll  be  in  the  next  room.  There  is  no  time  limit  on  any  of  these  e.^periments,  but  you 
should,  after  running  through  a  few  sets,  complete  a  set  of  fifteen  slides  in  l«?ss  than  fPteen 
minutes. 

Now,  are  there  any  questions  on  any  aspect  of  the  experiment?  For  this  first  set,  I'U  stay 

here  with  you  to  answer  any  ckher  questions  which  may  come  up  as  you  wci  k  the  console. 
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Apiiendix  B 
PUBLIC,ATJON*S 

This  apoendix  EuramarJzeg  publications  airend/  pi-ociuced  umter  C-  mract  AF  lS{fe04)‘'t-;«M3. 
It  confines;  Ustfif  to  publications  tii-tt  have  in  journals  or  ag  technical  uncumentai-jr  re¬ 

ports,  or  that  hivo  h'  stcccpt^d  and  ats  schedules  to  appear  in  soste  such  form,  plus  on^  Ph,.’ 


tliesis.  In  the  ooar.'j^^  of  Contract  AF  i9(?04)  'Ti53i  approximately  25  reporting  contract 

research  were  given  at  yarioas  fo-tKai  and  ifilorniai  meetings.  AUbougb  the  otore  roraiai 
speeches  quziUy  as  pubUcations  also,  no  atietnpi  is  ntade  In  this  report  to  list  thes?.  The  tech¬ 
nical  COS',  t  of  every  speech  pttraiieis  scsne  written  technical  documentary  report. 


This  Hat  of  pubilcatiorts  is  an  Important  complement  of  ihc  present  report,  since  sic  stt$gipt 
hag  beer,  made  in  the  fii..  1  report  itself  to  repeat  already -pubiighed  Idea#.  The  body  vt  ti)e  firsal 
report  is  devoted  only  to  the  presentation  of  maierials  not  yet  pubItShed. 

1.  Fdwards,  W.,  "Dynamic  'tecision  Theory  and  Probabilistic  Inlortcation  Processing," 

Human  Factors,  I9F  ,  4,  5S-73. 

This  pai^er  is  essentially  a  program  review  as  of  1961.  The  deveiopment  of  a  dynamic 
decision  theory  wlU  be  central  to  the  impending  rapid  expansion  of  research  on  human 
decision  processes.  In  a  taxonom,.  of  six  kinds  of  decision  problems,  five  require  a 
dynamic  theory  in  whic'w  the  decision  maker  is  assumed  to  make  a  sequence  of  decisions, 
basiive  decision  n  *  1  on  what  he  learned  from  decision  n  and  its  consequences,  fte- 
search  in  progress  on  information  seeking,  intuitive  statistics,  sequential  prediction, 
and  Bayesian  information  processing  is  reviewed  to  illustrate  the  kind  of  work  needed. 
The  relevance  of  mathematical  developments  in  dynamic  programming  and  Bayesian 
statistics  to  dynamic  decision  theory  is  examined.  A  man-computer  system  for  proba¬ 
bilistic  :  recessing  of  fallible  military  information  is  discu.  sed  la  sotite  detail  as  an 
application  of  these  icteas  and  as  a  setting  and  motivator  for  future  research  on  human 
snformation  processing  and  decision  making. 

2.  Edwards,  W.,  "Men  and  Computers,"  in  H,  M.  Gagne  (Ed.),  Psychological  Principles  in 

Systems  Development,  Holt,  Rinehart  and  Winston,  1962,  75-1137 

Tnis  expository  chapter  explains  wliat  a  computer  Is  ?r.d  how  i.  works,  discusses  pro¬ 
gramming  and  programming  languages,  reviews  the  technology  of  the  man-computer 
interface,  and  illustrates  real-time,  on-litw  use  of  computers  In  a  hypothetical  informa¬ 
tion-processing  system. 


i 

I 


€ 

i; 


69 


Hays,  W.  L.,  'On  Lattice  Models  for  Psychological  Scaling.”  Psycliometrika,  m  press. 


Edwards,  W.,  PmfaabUistic  Information  Pitjcessutg  in  Comniar.q  ar,d  Conirol  ^r&icrnB,,  ESD- 
TDR-62-34§,  ST  Hej»rtNGr5T§5^T5'Tj  Uaiversity  of  Institute  oi  Science 

and  Technology,  Ann  Arbor,  1963,  3*1  pp. 

This  is  the  basic  document  about  PIP.  It  discusses  the  diagnostic  (»  tion  m  command 
ronirul  systems,  and  presents  Bayes's  theorem,  examines  Us  rote  in  the  desigp 
oi  cl.  n-Jrsc  "M'.ii  ccotrol  systems  that  pro  babUusticaily  process  fallible  mforraation. 
After  sammarL’sog  existuiii  relevant  experimentation,  the  report  points  oat  major  un- 
8»5ivee  techiiicai  probiesis  and  outlines  a  program  of  research  for  scivsng  some  of 
jhei-J. 

Edwards,  W.,  Lindman,  K.,  and  Savage,  L-  •!..  "Bayesian  Siat’sticai  hiference  for  Psycho- 
logical  Research,"  Psychol.  Rev.,  J963,  ?0,  193-242. 

B?ve*5iar.  statistics,  a  currently  corjtrcver >lai  viewpoint  concerning  statistical  ir.fer- 
I't'i.i-  is  tiased  on  a  deiinitUm  of  probability  '  particular  measure  5f  the  opinions  of 
ideaUv  i:'>''Sist»;!U  people.  StoC.**iticai  inference  is  modification  of  the.se  P^tniong  -n  the 
light  oi  gvidf-nca,,  and  Bayes's  specif' "'c'  how  hueJj  modiiicatione  shou  d  \je. 

uiade.  The  tools  of  Bayesian  statistic's  wcuifJs  !  /<»'  the.ury  o>*  SiX'&ffic  distrsbuuons  aod 
the  principle  of  stable  estimation,  which  sfieclftes  'fcaen  actual  prior  j.pinions  may  be 
satisfactorily  approximated  by  e  uniform  dl8^rib^t^o^^  A  comnjfsn  teaturc  of  many 
classical  significance  tests  is  that  a  sharp  null  hypothesis  is  compared  with  a  ditft.'se 
altsrnatm  hypothesis.  (Xten  evidence  that,  for  a  Bayesian  statistician,  strikingly 
supports  the  null  hypothesis  leads  to  rejection  of  that  hypoin-esis  by  standard  classical 
procedures-  The  likelihood  principk  emphasized  in  Bayesian  statistics  implies,  among 
other  things,  that  the  rules  governing  termination  of  data  collectiofi  are  irrelevant  to 
data  interpretation.  It  is  entirely  appropriate  to  coiiect  data  until  a  point  has  been 
proven  or  disoroven,  or  until  the  data  collector  runs  cut  of  time,  money,  or  patienen. 

Edwards.  W.,  "Proba'Dilistic  Information  Processing  by  Men,  Machines  a.nd  Man-Machine 
Systems,"  in  Proceedif-gs  of  the  X\-nth  International  Congress  of  Psychoiogy  (Washing¬ 
ton,  Ai^st  23,  1963),  North  Holland  Pub.  Co.,  Amsterdam,  19^. 

This  is  a  speech  covering  much  the  same  materials  as  the  immediately  following  refer¬ 
ence;  a  three -page  abstract  of  the  speech  wlil  be  published  in  the  proceedings  of  the 
Congress. 

Edwards,  W.,  KiiUlps,  L.  D-,  "Man  as  Transducer  for  Probabilities  i.c  Bayesian  Command 
and  Control  Systems,"  in  G.  L.  Bryan  and  M.  W.  Shelly  (Eds),,  Human  Jvdgmenis  and 
C^timaiity,  W'iley  it  Sons,  New  York,  1904. 

This  chapter,  a  more  recent  discussion  of  PIP,  presents  a  fairly  specific  pj'oposal  fer 
the  design  of  a  class  of  systems  which,  by  using  human  Judgment  in  a  rather  unc.-^nven- 
tionai  way,  should  be  able  to  make  more  nearly  optimal  decisions  than  do  present 

70 


systems  inlt  .ded  for  the  game  purpose.  It  supports  the  uroposai  i.*y  reporting  an  experi¬ 
ment  which  shows  that  men  required  te  draw  conclusiorts  from  falliDie  data  do  it  poorly 
enough  lo  leave  room  for  vast  improvement, 

e.  Edwards.  W.,  The  Design  and  Evaluation  of  Probabilistic  Information  Processing  Systems, 
Proceedings  of  the  Fifth  National  Symposium  on  Human  Factors  tn  Electronits.  May  5-6, 
lSd4.  sari  Diego,  California,  Professional  Technical  Group  on  Human  facldrs  in  Eiec- 
ironies,  liisiitate  of  Electrical  and  Electronics  Engj.neers,  1964,  pp.  169'181. 

A  major  task  of  a  command  and  control  system  often  is  to  determine  what  is  happening 
ir.  its  environnient-  Conclusive  information  is  usually  lacking,  so  such  systems  n.ist 
attempt  to  synthesize  thousands  of  items  of  information,  each  individually  worth  little, 
into  an  accurate  picture  or  diagnosis  of  the  relevant  environment.  Current  systems 
(e.g.,  the  NORAD  Combat  Operations  Center)  use  sophisticated  display  and  information 
retrieval  devices,  twt  leave  to  unaided  human  ja<^njent  the  task  of  synthesis  followed 
by  declsion- 

The  ideas  of  Bayesian  statistics  offer  the  basis  for  a  new  techiwlcjgy  of  diagnostic  irJor- 
mation  processing.  In  the  Bayesian  view,  probabilities  are  orderly  or  consistent  opin¬ 
ions,  and  Bayes’s  theorem  of  probability  theory  Is  the  optimal  rule  for  revising  opinion 
on  the  basis  of  information.  The  crucial  input  to  Bayes's  theorem  is  the  probability, 
for  each  datum  to  be  processed  and  for  each  hypothesis  of  interest,  that  the  datum 
would  occur  if  the  hypothesis  were  true  Research  suggests  that  experts  can  es..mate 
such  probabilities,  or  numbers  that  can  be  transited  into  them,  with  fair  accuracy. 
Once  such  probabilities  arc  available,  a  desk  calculator  or  comfsiter  can  easily  synthe¬ 
size  them  into  a  posterior  distribution  that  gives  the  current  probabliity  of  each  hypoth¬ 
esis  of  interest  on  the  basis  of  ail  the  available  data. 

Details  of  the  design  of  such  a  prohabiiistie  information  processing  system  (FfP)  are 
presented.  Laboratory  research  completed  and  tn  progress  is  reviewed,  along  with 
simulation  studies  intended  to  compare  nPs  with  traditional  inforffiatlon  processing 
systems  in  complex  and  realistic  environments. 

9.  Edwards,  W.,  'Optimal  Strategies  for  Seeking  Information;  Models  for  Statistics.  Choice 
Reaction  Times  and  Human  information  Processing."  J,  Math.  Psych.,  1S85, 
in  press.  ’  " 

htodels  for  optional  stopping  in  statistics  are  also  normative  modeis  for  a  vanety  of 
tasks  In  which  subjects  may  purchase  risk-reducing  informacio.n  before  making  a  de¬ 
cision.  This  paper  develops  a  Bayesian  model  for  '^tional  stopping  in  the  continuous 
case  with  two  hypotlies«s,  it  takes  ejcpiicit  account  of  cost  of  information,  values  of  the 
possible  outcomes  of  the  final  decision,  and  p'^ior  prolabUUies  of  tl»  hypetheses.  Ex¬ 
tensive  tables  of  numerical  solutions  to  the  model’s  transcendental  equations  are  pro- 
videil. 
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Two  mo^is  for  choice  reaction  time  are  derived.  One  is  based  on  the  normaiity 
assumptions  of  signal  detectability  tteory;  the  other  is  nonpara  metric.  They  are  for¬ 
mally  identical;  in  this  case  the  normality  assumptions  are  superfluous.  The  non- 
farametrie  model  mates  strong  predictions  about  times  and  errors;  it  has  o.nly  one 
quantity  not  dtrectiy  ohser*.  abie- 

A  second  example  uses  the  aon|arametric  model  to  design  and  predict  results  of  a 
binomial  information -purchase  expentaenl. 

10.  Siovle,  S.  P.,  Value  as  a  Determiner  of  Subjective  Probability.  Unpublished  doctoral  dis¬ 
sertation  ,  University  of  Michigan,  Ann  Arbor,  19^. 

The  purpose  of  this  study  was  to  explore  the  manner  in  which  judged  probabilities  of 
e%’ents  are  influenced  by  the  desirabiity  of  these  events. 

Subjects  were  shown  five  bags,  each  containing  100  poker  chips  They  were  »old  that 
one  bag  contained  30  red  chips,  one  contained  40,  one  50,  one  60,  and  one  70;  tte  re¬ 
maining  chips  in  each  bag  were  blue.  Subjects  could  not  tell  which  bag  was  which.  One 
of  the  bags  was  selected  by  the  subjects  and  tte  experimenter  proceeded  to  draw  a 
sample  of  50  chips  from  it,  one  at  a  time,  with  replacement.  Subjects  obeerved  the 
sample  and,  at  various  times,  made  direct  probability  estimates  for  each  of  the  five 
possible  compositions  of  the  lag.  They  were  toid  that  a  monetary  payoff  would  lie  given 
to  them,  regardless  of  their  probability  estimates,  depending  on  what  tte  true  contents 
happened  to  be.  The  table  below  shows  the  assignment  of  payoffs  to  bags. 


True  Composition  of  Bag 


30  Red 

40  Red 

50  Red 

60  Red 

70  Red 

$  0 

f  0 

$ 

0 

$  0 

S  0 

lose  $1 

lose  $5 

$ 

0 

win  S5 

win  $1 

lose  $5 

lose  SI 

$ 

0 

win  $1 

win  $5 

lose  f  I 

lose  $5 

f 

0 

win  55 

win  SI 

;Uied  Experiment  L 

Group  II^.  differed  from  Gro 

Group  1  and  Group 
Group  H  and  Group 
Group  in  and  Group 
Group  n^. 


by  ha‘'ing  received  «  brief  warning  not  to  allow  the  valims  to  bias  their  estlmaies.  None 
of  these  groups  were  rewarded  for  the  accuracy  of  their  prolabUity  estimates.  Groups 
1.,,  Ilg,  and  IHj.  constituted  Experiment  n.  These  groups  were  rewarded  for  accurate 
estimation.  Groups  I  and  were  control  groups  for  wlwm  ail  hypotheses  had  neutral 
desirability.  A  trick  device  enabled  the  experimenter  to  draw  the  same  sample  of  chips 
for  every  group. 


Tte  results  indicated  tMt  the  value  of  an  event  does  affect  judgments  about  its  proba¬ 
bility.  However,  the  nature  of  value  biases  is  rather  complicated.  It  varies  systematic- 
aUv  among  subjects  and  among  iriais.  Some  subjects  in  the  payoff  groups  were  optimis¬ 
tic.  They  consistently  gave  higher  prolmbilities  to  the  desired  events  arwl  lower 
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protabilittes  to  the  unctesired  events  than  did  subjects  in  the  control  groups.  Others 
were  generally  pessimistic.  EJespite  the  consistency  of  individual  differences,  value 
groups  showed  more  o{rtitnisnj  (or  pessimism)  at  some  times  during  the  sampiing  than 
at  others.  These  differences  among  trials  were  similar  sn  both  experiments. 

The  reward  for  accuracy  did  not  reduce  value  biases.  Some  subjects  in  Groups 
and  IIIj^  overestimated  the  protebility  of  the  most  undesired  event  so  that,  if  it  did 
occur,  the  iai^er  reward  for  accuracy  would  reduce  their  loss. 

Bayes’s  theorem  provides  a  normative  model  for  probability  estimation  in  this  task. 
Prolabiiities  given  by  subjects  in  the  control  groups  were  closer  to  l^yesian  proba¬ 
bilities  than  were  those  given  by  subjects  for  who.®  payoffs  were  associated  with  the 
events-  The  inferiority  shown  by  members  of  value  groups  did  not  diminish  as  they 
accumulated  more  information  about  the  bag,  and  was  not  reduced  by  rewards  for 
accuracy. 

The  brief  warning  given  to  Group  effectively  reduced  value  biases.  These  subjects 
behaved  more  like  those  in  Group  I  than  like  those  in  Groups  n  and  ID. 
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an  indicaiior-  c-f  the  military  security  classification  of  the  in 
formaiioo  in  the  paragraph  represented  as  iTS;  tS)  (C)  oi  .V > 

There  i»  no  limitation  tn  the  len-  >  of  the  abstract  How 
ever,  the  suggest<d  length  is  from  l5v  to  225  words 

14  KEY  WORDS  Key  words  ere  technically  meaningful  te.-ms 
or  short  phrases  that  characterise  a  report  and  may  be  used  as 
index  entries  for  cataloging  the  report  Key  words  mus;  be 
selected  so  that  no  security  tlassHication  is  requ.red  Ideo'.i 
fiets  Such  as  equipment  model  designation,  trade  name,  military 
project  code  name,  geographic  lovr-tion,  may  be  used  as  key¬ 
words  but  will  be  followed  by  an  indication  of  te-hnicsi  c-on 
text  The  assignment  of  '.inlis.  rules,  and  wei^ts  is  opiional 


